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Preface 


“Audience, level, 
and treatment — 
a description of 
such matters is 
what prefaces are 
supposed to be 
about.” 

— P. R. Halmos [1 73] 


“People do acquire a 
little brief author- 
ity by equipping 
themselves with 
jargon: they can 
pontificate and air a 
superficial expertise. 
But what we should 
ask of educated 
mathematicians is 
not what they can 
speechify about, 
nor even what they 
know about the 
existing corpus 
of mathematical 
knowledge, but 
rather what can 
they now do with 
their learning and 
whether they can 
actually solve math- 
ematical problems 
arising in practice. 

In short, we look for 
deeds not words. ” 

— J. Hammersley [1 76] 


THIS BOOK IS BASED on a course of the same name that has been taught 
annually at Stanford University since 1970. About fifty students have taken it 
each year — juniors and seniors, but mostly graduate students — and alumni 
of these classes have begun to spawn similar courses elsewhere. Thus the time 
seems ripe to present the material to a wider audience (including sophomores). 

It was a dark and stormy decade when Concrete Mathematics was born. 
Long-held values were constantly being questioned during those turbulent 
years; college campuses were hotbeds of controversy. The college curriculum 
itself was challenged, and mathematics did not escape scrutiny. John Ham- 
mersley had just written a thought-provoking article “On the enfeeblement of 
mathematical skills by ‘Modern Mathematics’ and by similar soft intellectual 
trash in schools and universities” [176]; other worried mathematicians [332] 
even asked, “Can mathematics be saved?” One of the present authors had 
embarked on a series of books called The Art of Computer Programming, and 
in writing the first volume he (DEK) had found that there were mathematical 
tools missing from his repertoire; the mathematics he needed for a thorough, 
well-grounded understanding of computer programs was quite different from 
what he’d learned as a mathematics major in college. So he introduced a new 
course, teaching what he wished somebody had taught him. 

The course title “Concrete Mathematics” was originally intended as an 
antidote to “Abstract Mathematics,” since concrete classical results were rap- 
idly being swept out of the modern mathematical curriculum by a new wave 
of abstract ideas popularly called the “New Math.” Abstract mathematics is a 
wonderful subject, and there’s nothing wrong with it: It’s beautiful, general, 
and useful. But its adherents had become deluded that the rest of mathemat- 
ics was inferior and no longer worthy of attention. The goal of generalization 
had become so fashionable that a generation of mathematicians had become 
unable to relish beauty in the particular, to enjoy the challenge of solving 
quantitative problems, or to appreciate the value of technique. Abstract math- 
ematics was becoming inbred and losing touch with reality; mathematical ed- 
ucation needed a concrete counterweight in order to restore a healthy balance. 

When DEK taught Concrete Mathematics at Stanford for the first time, 
he explained the somewhat strange title by saying that it was his attempt 
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to teach a math course that was hard instead of soft. He announced that, 
contrary to the expectations of some of his colleagues, he was not going to 
teach the Theory of Aggregates, nor Stone’s Embedding Theorem, nor even 
the Stone-Cech compactification. (Several students from the civil engineering 
department got up and quietly left the room.) 

Although Concrete Mathematics began as a reaction against other trends, 
the main reasons for its existence were positive instead of negative. And as 
the course continued its popular place in the curriculum, its subject matter 
“solidified” and proved to be valuable in a variety of new applications. Mean- 
while, independent confirmation for the appropriateness of the name came 
from another direction, when Z. A. Melzak published two volumes entitled 
Companion to Concrete Mathematics [267]. 

The material of concrete mathematics may seem at first to be a disparate 
bag of tricks, but practice makes it into a disciplined set of tools. Indeed, the 
techniques have an underlying unity and a strong appeal for many people. 
When another one of the authors (RLG) first taught the course in 1979, the 
students had such fun that they decided to hold a class reunion a year later. 

But what exactly is Concrete Mathematics? It is a blend of CONtinuous 
and disCRETE mathematics. More concretely, it is the controlled manipulation 
of mathematical formulas, using a collection of techniques for solving prob- 
lems. Once you, the reader, have learned the material in this book, all you 
will need is a cool head, a large sheet of paper, and fairly decent handwriting 
in order to evaluate horrendous-looking sums, to solve complex recurrence 
relations, and to discover subtle patterns in data. You will be so fluent in 
algebraic techniques that you will often find it easier to obtain exact results 
than to settle for approximate answers that are valid only in a limiting sense. 

The major topics treated in this book include sums, recurrences, ele- 
mentary number theory, binomial coefficients, generating functions, discrete 
probability, and asymptotic methods. The emphasis is on manipulative tech- 
nique rather than on existence theorems or combinatorial reasoning; the goal 
is for each reader to become as familiar with discrete operations (like the 
greatest-integer function and finite summation) as a student of calculus is 
familiar with continuous operations (like the absolute-value function and in- 
finite integration). 

Notice that this list of topics is quite different from what is usually taught 
nowadays in undergraduate courses entitled “Discrete Mathematics.” There- 
fore the subject needs a distinctive name, and “Concrete Mathematics” has 
proved to be as suitable as any other. 

The original textbook for Stanford’s course on concrete mathematics was 
the “Mathematical Preliminaries” section in The Art of Computer Program- 
ming [207]. But the presentation in those 110 pages is quite terse, so another 
author (OP) was inspired to draft a lengthy set of supplementary notes. The 


“The heart of math- 
ematics consists 
of concrete exam- 
ples and concrete 
problems" 

— P.R. Haimos [172] 


“It is downright 
sinful to teach the 
abstract before the 
concrete. ” 

— Z. A. Melzak [267] 


Concrete Mathe- 
matics is a bridge 
to abstract mathe- 
matics. 


“The advanced 
reader who skips 
parts that appear 
too elementary may 
miss more than 
the less advanced 
reader who skips 
parts that appear 
too complex.” 

— G. Polya [297] 


(We’re not bold 
enough to try 
Distinuous Math- 
ematics.) 
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. a concrete 
life preserver 
thrown to students 
sinking in a sea of 
abstraction.” 

— W. Gottschalk 


Math graffiti: 

Kiiroy wasn’t Haar. 
Free the group. 
Nuke the kernel. 
Power to the n. 
N=1 4 P=NP. 


I have only a 
marginal interest 
in this subject. 


This was the most 
enjoyable course 
I’ve ever had. But 
it might be nice 
to summarize the 
material as you 
go along. 


present book is an outgrowth of those notes; it is an expansion of, and a more 
leisurely introduction to, the material of Mathematical Preliminaries. Some of 
the more advanced parts have been omitted; on the other hand, several topics 
not found there have been included here so that the story will be complete. 

The authors have enjoyed putting this book together because the subject 
began to jell and to take on a life of its own before our eyes; this book almost 
seemed to write itself. Moreover, the somewhat unconventional approaches 
we have adopted in several places have seemed to fit together so well, after 
these years of experience, that we can’t help feeling that this book is a kind 
of manifesto about our favorite way to do mathematics. So we think the book 
has turned out to be a tale of mathematical beauty and surprise, and we hope 
that our readers will share at least e of the pleasure we had while writing it. 

Since this book was born in a university setting, we have tried to capture 
the spirit of a contemporary classroom by adopting an informal style. Some 
people think that mathematics is a serious business that must always be cold 
and dry; but we think mathematics is fun, and we aren’t ashamed to admit 
the fact. Why should a strict boundary line be drawn between work and 
play? Concrete mathematics is full of appealing patterns; the manipulations 
are not always easy, but the answers can be astonishingly attractive. The 
joys and sorrows of mathematical work are reflected explicitly in this book 
because they are part of our lives. 

Students always know better than their teachers, so we have asked the 
first students of this material to contribute their frank opinions, as “graffiti” 
in the margins. Some of these marginal markings are merely corny, some 
are profound; some of them warn about ambiguities or obscurities, others 
are typical comments made by wise guys in the back row; some are positive, 
some are negative, some are zero. But they all are real indications of feelings 
that should make the text material easier to assimilate. (The inspiration for 
such marginal notes comes from a student handbook entitled Approaching 
Stanford, where the official university line is counterbalanced by the remarks 
of outgoing students. For example, Stanford says, “There are a few things 
you cannot miss in this amorphous shape which is Stanford”; the margin 
says, “Amorphous . . . what the h*** does that mean? Typical of the pseudo- 
intellectualism around here.” Stanford: “There is no end to the potential of 
a group of students living together.” Graffito: “Stanford dorms are like zoos 
without a keeper.” ) 

The margins also include direct quotations from famous mathematicians 
of past generations, giving the actual words in which they announced some 
of their fundamental discoveries. Somehow it seems appropriate to mix the 
words of Leibniz, Euler, Gauss, and others with those of the people who 
will be continuing the work. Mathematics is an ongoing endeavor for people 
everywhere; many strands are being woven into one rich fabric. 
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This book contains more than 500 exercises, divided into six categories: 

• Warmups are exercises that every reader should try to do when first 
reading the material. 

• Basics are exercises to develop facts that are best learned by trying 
one’s own derivation rather than by reading somebody else’s. 

• Homework exercises are problems intended to deepen an understand- 
ing of material in the current chapter. 

• Exam problems typically involve ideas from two or more chapters si- 
multaneously; they are generally intended for use in take-home exams 
(not for in-class exams under time pressure). 

• Bonus problems go beyond what an average student of concrete math- 
ematics is expected to handle while taking a course based on this book; 
they extend the text in interesting ways. 

• Research problems may or may not be humanly solvable, but the ones 
presented here seem to be worth a try (without time pressure). 

Answers to all the exercises appear in Appendix A, often with additional infor- 
mation about related results. (Of course, the “answers” to research problems 
are incomplete; but even in these cases, partial results or hints are given that 
might prove to be helpful.) Readers are encouraged to look at the answers, 
especially the answers to the warmup problems, but only after making a 
serious attempt to solve the problem without peeking. 

We have tried in Appendix C to give proper credit to the sources of 
each exercise, since a great deal of creativity and/or luck often goes into 
the design of an instructive problem. Mathematicians have unfortunately 
developed a tradition of borrowing exercises without any acknowledgment; 
we believe that the opposite tradition, practiced for example by books and 
magazines about chess (where names, dates, and locations of original chess 
problems are routinely specified) is far superior. However, we have not been 
able to pin down the sources of many problems that have become part of the 
folklore. If any reader knows the origin of an exercise for which our citation 
is missing or inaccurate, we would be glad to learn the details so that we can 
correct the omission in subsequent editions of this book. 

The typeface used for mathematics throughout this book is a new design 
by Hermann Zapf [227], commissioned by the American Mathematical Society 
and developed with the help of a committee that included B. Beeton, R. P. 
Boas, L. K. Durst, D. E. Knuth, P. Murdock, R. S. Palais, P. Renz, E. Swanson, 
S. B. Whidden, and W. B. Woolf. The underlying philosophy of Zapf’s design 
is to capture the flavor of mathematics as it might be written by a mathemati- 
cian with excellent handwriting. A handwritten rather than mechanical style 
is appropriate because people generally create mathematics with pen, pencil, 


I see: 

Concrete mathemat- 
ics means drilling. 


The homework was 
tough but I learned 
a lot. It was worth 
every hour. 


Take-home exams 
are vital — keep 
them. 

Exams were harder 
than the homework 
led me to expect. 


Cheaters may pass 
this course by just 
copying the an- 
swers, but they’re 
only cheating 
themselves. 


Difficult exams 
don ’t take into ac- 
count students who 
have other classes 
to prepare for. 
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I’m unaccustomed 
to this face. 


Dear prof: Thanks 
for (1 ) the puns, 
(2) the subject 
matter. 


I don ’t see how 
what I’ve learned 
will ever help me. 


I had a lot of trou- 
ble in this class, but 
I know it sharpened 
my math skills and 
my thinking skills. 


I would advise the 
casual student to 
stay away from this 
course. 


or chalk. (For example, one of the trademarks of the new design is the symbol 
for zero, ‘O’, which is slightly pointed at the top because a handwritten zero 
rarely closes together smoothly when the curve returns to its starting point.) 
The letters are upright, not italic, so that subscripts, superscripts, and ac- 
cents are more easily fitted with ordinary symbols. This new type family has 
been named AMS Euler, after the great Swiss mathematician Leonhard Euler 
(1707-1783) who discovered so much of mathematics as we know it today. 
The alphabets include Euler Text (AaBb Cc through XxYp Zz), Euler Frak- 
tur (2la2Jb£c through Xy2)t)33), and Euler Script Capitals (.A “Be through 
X]jZ), as well as Euler Greek (AaB(3Ty through XxTtJ; £lcu) and special 
symbols such as p and K. We are especially pleased to be able to inaugurate 
the Euler family of typefaces in this book, because Leonhard Euler’s spirit 
truly lives on every page: Concrete mathematics is Eulerian mathematics. 

The authors are extremely grateful to Andrei Broder, Ernst Mayr, An- 
drew Yao, and Frances Yao, who contributed greatly to this book dining the 
years that they taught Concrete Mathematics at Stanford. Furthermore we 
offer 1024 thanks to the teaching assistants who creatively transcribed what 
took place in class each year and who helped to design the examination ques- 
tions; their names are listed in Appendix C. This book, which is essentially 
a compendium of sixteen years’ worth of lecture notes, would have been im- 
possible without their first-rate work. 

Many other people have helped to make this book a reality. For example, 
we wish to commend the students at Brown, Columbia, CUNY, Princeton, 
Rice, and Stanford who contributed the choice graffiti and helped to debug 
our first drafts. Our contacts at Addison- Wesley were especially efficient 
and helpful; in particular, we wish to thank our publisher (Peter Gordon), 
production supervisor (Bette Aaronson), designer (Roy Brown), and copy ed- 
itor (Lyn Dupre). The National Science Foundation and the Office of Naval 
Research have given invaluable support. Cheryl Graham was tremendously 
helpful as we prepared the index. And above all, we wish to thank our wives 
(Fan, Jill, and Amy) for their patience, support, encouragement, and ideas. 

This second edition features a new Section 5.8, which describes some 
important ideas that Doron Zeilberger discovered shortly after the first edition 
went to press. Additional improvements to the first printing can also be found 
on almost every page. 

We have tried to produce a perfect book, but we are imperfect authors. 
Therefore we solicit help in correcting any mistakes that we’ve made. A re- 
ward of $2.56 will gratefully be paid to the first finder of any error, whether 
it is mathematical, historical, or typographical. 

Murray Hill, New Jersey — RLG 

and Stanford, California DEK 

May 1988 and October 1993 OP 



A Note on Notation 


SOME OF THE SYMBOLISM in this book has not (yet?) become standard. 
Here is a list of notations that might be unfamiliar to readers who have learned 
similar material from other books, together with the page numbers where 
these notations are explained. (See the general index, at the end of the book, 
for references to more standard notations.) 


Notation 

Name 

Page 


lnx 

natural logarithm: log e x 

276 


lgx 

binary logarithm: log 2 x 

70 


logx 

common logarithm: log 10 x 

449 


w 

floor: max{n | n ^ x, integer n} 

67 


M 

ceiling: min{ n | n ^ x, integer n} 

67 


x mod y 

remainder: x — y [ x/yj 

82 


{x} 

fractional part: x mod 1 

70 


y f(x) 6x 

indefinite summation 

48 


y f (x) §x 

^ — a 

definite summation 

49 


x^ 

falling factorial power: x!/(x — n) ! 

47,211 


X- 

rising factorial power: F(x + n)/r(x) 

48,211 


ni 

subfactorial: n!/0! — rt!/l ! + --• + (—1 ) n n!/n! 

194 

If you don’t under- 

mz 

real part: x, if z = x + iy 

64 

stand what the 
x denotes at the 

Jz 

imaginary part: y, if z = x + iy 

64 

bottom of this page, 
try asking your 

H„ 

harmonic number: 1/1 + • • • + 1 /n 

29 

Latin professor 
instead of your 

H (x) 

1 l n 

generalized harmonic number: 1/1 x + -- - + 1/n x 

277 

math professor. 



A NOTE ON NOTATION 


Prestressed concrete 
mathematics is con- 
crete mathematics 
that’s preceded by 
a bewildering list 
of notations. 


Also ‘nonstring’ is 
a string. 


f (m) (z) 


n 

m 



#A 


[z n ] f (z) 
[a. . (3] 
[m = n] 
[m\n] 
[m\\u] 
[m_!_n] 


rath derivative of f at z 

Stirling cycle number (the “first kind”) 

Stirling subset number (the “second kind” ) 

Eulerian number 

Second-order Eulerian number 

radix notation for ^IkLo a kb k 
continuant polynomial 

hypergeometric function 

cardinality: number of elements in the set A 

coefficient of z n in f(z) 

closed interval: the set {x | cc ^ x ^ (3} 

1 if m = n, otherwise 0 * 

1 if m divides n, otherwise 0 * 

1 if m exactly divides n, otherwise 0 * 

1 if m is relatively prime to n, otherwise 0 * 


470 

259 

258 

267 

270 

11 

302 

205 

39 

197 

73 

24 

102 

146 

115 


*In general, if S is any statement that can be true or false, the bracketed 
notation [S] stands for 1 if S is true, 0 otherwise. 

Throughout this text, we use single-quote marks (‘. . . ’) to delimit text as 
it is written , double-quote marks (“. . .”) for a phrase as it is spoken. Thus, 
the string of letters ‘string’ is sometimes called a “string.” 

An expression of the form ‘a/bc’ means the same as ‘a/(bc)\ Moreover, 
log x/log y = (logx)/(logy) and 2n! =2(n!). 
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Raise your hand 
if you’ve never 
seen this. 

OK, the rest of 
you can cut to 
equation (1.1). 


Gold — wow. 

Are our disks made 
of concrete? 


Recurrent Problems 


THIS CHAPTER EXPLORES three sample problems that give a feel for 
what’s to come. They have two traits in common: They’ve all been investi- 
gated repeatedly by mathematicians; and their solutions all use the idea of 
recurrence, in which the solution to each problem depends on the solutions 
to smaller instances of the same problem. 

1.1 THE TOWER OF HANOI 

Let’s look first at a neat little puzzle called the Tower of Hanoi, 
invented by the French mathematician Edouard Lucas in 1883. We are given 
a tower of eight disks, initially stacked in decreasing size on one of three pegs: 


The objective is to transfer the entire tower to one of the other pegs, moving 
only one disk at a time and never moving a larger one onto a smaller. 

Lucas [260] furnished his toy with a romantic legend about a much larger 
Tower of Brahma, which supposedly has 64 disks of pure gold resting on three 
diamond needles. At the beginning of time, he said, God placed these golden 
disks on the first needle and ordained that a group of priests should transfer 
them to the third, according to the rules above. The priests reportedly work 
day and night at their task. When they finish, the Tower will crumble and 
the world will end. 


1 




2 RECURRENT PROBLEMS 


It’s not immediately obvious that the puzzle has a solution, but a little 
thought (or having seen the problem before) convinces us that it does. Now 
the question arises: What’s the best we can do? That is, how many moves 
are necessary and sufficient to perform the task? 

The best way to tackle a question like this is to generalize it a bit. The 
Tower of Brahma has 64 disks and the Tower of Hanoi has 8; let’s consider 
what happens if there are n disks. 

One advantage of this generalization is that we can scale the problem 
down even more. In fact, we’ll see repeatedly in this book that it’s advanta- 
geous to LOOK AT small CASES first. It’s easy to see how to transfer a tower 
that contains only one or two disks. And a small amount of experimentation 
shows how to transfer a tower of three. 

The next step in solving the problem is to introduce appropriate notation: 
name and conquer. Let’s say that T n is the minimum number of moves 
that will transfer n disks from one peg to another under Lucas’s rules. Then 
T| is obviously 1 , and Tz = 3. 

We can also get another piece of data for free, by considering the smallest 
case of all: Clearly To = 0, because no moves at all are needed to transfer a 
tower of n = 0 disks! Smart mathematicians are not ashamed to think small, 
because general patterns are easier to perceive when the extreme cases are 
well understood (even when they are trivial). 

But now let’s change our perspective and try to think big; how can we 
transfer a large tower? Experiments with three disks show that the winning 
idea is to transfer the top two disks to the middle peg, then move the third, 
then bring the other two onto it. This gives us a clue for transferring n disks 
in general: We first transfer the n — 1 smallest to a different peg (requiring 
T n _i moves), then move the largest (requiring one move), and finally transfer 
the n— 1 smallest back onto the largest (requiring another T n _i moves). Thus 
we can transfer n disks (for n > 0) in at most 2T n _i + 1 moves: 

T n T 2T n _! + 1 , for n > 0. 

This formula uses ‘ ^ ’ instead of ‘ = ’ because our construction proves only 
that 2T n _i + 1 moves suffice; we haven’t shown that 2T n _i + 1 moves are 
necessary. A clever person might be able to think of a shortcut. 

But is there a better way? Actually no. At some point we must move the 
largest disk. When we do, the n — 1 smallest must be on a single peg, and it 
has taken at least T n _i moves to put them there. We might move the largest 
disk more than once, if we’re not too alert. But after moving the largest disk 
for the last time, we must transfer the n— 1 smallest disks (which must again 
be on a single peg) back onto the largest; this too requires T n _i moves. Hence 

T n 2T n _i + 1 , for n > 0. 


Most of the pub- 
lished “solutions" 
to Lucas’s problem, 
like the early one 
of Allardice and 
Fraser [7], fail to ex- 
plain why T n must 
be Js 2T n -i + 1 . 
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Yeah, yeah . . . 

I seen that word 
before. 


These two inequalities, together with the trivial solution for n = 0, yield 

T o = °: , . 

T n = 2T n _i + 1 , for n > 0. 

(Notice that these formulas are consistent with the known values T| = 1 and 
T 2 = 3. Our experience with small cases has not only helped us to discover 
a general formula, it has also provided a convenient way to check that we 
haven’t made a foolish error. Such checks will be especially valuable when we 
get into more complicated maneuvers in later chapters.) 

A set of equalities like ( 1 . 1 ) is called a recurrence (a.k.a. recurrence 
relation or recursion relation) . It gives a boundary value and an equation for 
the general value in terms of earlier ones. Sometimes we refer to the general 
equation alone as a recurrence, although technically it needs a boundary value 
to be complete. 

The recurrence allows us to compute T n for any n we like. But nobody 
really likes to compute from a recurrence, when n is large; it takes too long. 
The recurrence only gives indirect, local information. A solution to the 
recurrence would make us much happier. That is, we’d like a nice, neat, 
“closed form” for T n that lets us compute it quickly, even for large n. With 
a closed form, we can understand what T n really is. 

So how do we solve a recurrence? One way is to guess the correct solution, 
then to prove that our guess is correct. And our best hope for guessing 
the solution is to look (again) at small cases. So we compute, successively, 
T 3 =2-3 + 1 =7; T 4 =2-7+1 = 15; T 5 =2-15 + 1 = 31; T 6 = 2-31 + 1 =63. 
Aha! It certainly looks as if 

T n = 2 n — 1 , for n ^ 0. ( 1 . 2 ) 


Mathematical in- 
duction proves that 
we can climb as 
high as we like on 
a ladder, by proving 
that we can climb 
onto the bottom 
rung (the basis) 
and that from each 
rung we can climb 
up to the next one 
(the induction). 


At least this works for n ^ 6. 

Mathematical induction is a general way to prove that some statement 
about the integer n is true for all n no. First we prove the statement 
when n has its smallest value, no; this is called the basis. Then we prove the 
statement for n > no, assuming that it has already been proved for all values 
between no and n — 1 , inclusive; this is called the induction. Such a proof 
gives infinitely many results with only a finite amount of work. 

Recurrences are ideally set up for mathematical induction. In our case, 
for example, ( 1 . 2 ) follows easily from ( 1 . 1 ): The basis is trivial, since To = 
2° — 1 =0. And the induction follows for n > 0 if we assume that ( 1 . 2 ) holds 
when n is replaced by n — 1 : 

T n = 2T n _] + 1 = 2(2 n - 1 - 1) + 1 = 2 n - 1 . 

Hence ( 1 . 2 ) holds for n as well. Good! Our quest for T n has ended successfully. 
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Of course the priests’ task hasn’t ended; they’re still dutifully moving 
disks, and will be for a while, because for n = 64 there are 2 b4 — 1 moves (about 
18 quintillion) . Even at the impossible rate of one move per microsecond, they 
will need more than 5000 centuries to transfer the Tower of Brahma. Lucas’s 
original puzzle is a bit more practical. It requires 2 s — 1 = 255 moves, which 
takes about four minutes for the quick of hand. 

The Tower of Hanoi recurrence is typical of many that arise in applica- 
tions of all kinds. In finding a closed-form expression for some quantity of 
interest like T n we go through three stages: 

1 Look at small cases. This gives us insight into the problem and helps us 
in stages 2 and 3. 

2 Find and prove a mathematical expression for the quantity of interest. 
For the Tower of Hanoi, this is the recurrence ( 1 . 1 ) that allows us, given 
the inclination, to compute T n for any n. 

3 Find and prove a closed form for our mathematical expression. For the 
Tower of Hanoi, this is the recurrence solution ( 1 . 2 ). 

The third stage is the one we will concentrate on throughout this book. In 
fact, we’ll frequently skip stages 1 and 2 entirely, because a mathematical 
expression will be given to us as a starting point. But even then, we’ll be 
getting into subproblems whose solutions will take us through all three stages. 

Our analysis of the Tower of Hanoi led to the correct answer, but it 
required an “inductive leap”; we relied on a lucky guess about the answer. 
One of the main objectives of this book is to explain how a person can solve 
recurrences without being clairvoyant. For example, we’ll see that recurrence 
( 1 . 1 ) can be simplified by adding 1 to both sides of the equations: 

To + 1 = 1 ; 

T n + 1 = 2T n _! + 2 , for n > 0. 

Now if we let U n = T n + 1 , we have 

U 0 = 1 ; 

Un=2U n _i, for n > 0. ^ 

It doesn’t take genius to discover that the solution to this recurrence is just 
U n = 2 n ; hence T n = 2 n — 1. Even a computer could discover this. 

1.2 LINES IN THE PLANE 

Our second sample problem has a more geometric flavor: How many 
slices of pizza can a person obtain by making n straight cuts with a pizza 
knife? Or, more academically: What is the maximum number L n of regions 


What is a proof? 
“One half of one 
percent pure alco- 
hol" 


Interesting: We get 
rid of the +1 in 
(1.1) by adding, not 
by subtracting. 
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(A pizza with Swiss 
cheese?) 


A region is convex 
if it includes ail 
line segments be- 
tween any two of its 
points. (That’s not 
what my dictionary 
says, but it’s what 
mathematicians 
believe.) 


defined by n lines in the plane? This problem was first solved in 1826, by the 
Swiss mathematician Jacob Steiner [338]. 

Again we start by looking at small cases, remembering to begin with the 
smallest of all. The plane with no lines has one region; with one line it has 
two regions; and with two lines it has four regions: 


1 


L 0 = l 




(Each line extends infinitely in both directions.) 

Sure, we think, L n = 2 n ; of course! Adding a new line simply doubles 
the number of regions. Unfortunately this is wrong. We could achieve the 
doubling if the nth line would split each old region in two; certainly it can 
split an old region in at most two pieces, since each old region is convex. (A 
straight line can split a convex region into at most two new regions, which 
will also be convex.) But when we add the third line — the thick one in the 
diagram below — we soon find that it can split at most three of the old regions, 
no matter how we’ve placed the first two lines: 



Thus L3 = 4 + 3 = 7 is the best we can do. 

And after some thought we realize the appropriate generalization. The 
nth line (for n > 0) increases the number of regions by k if and only if it 
splits k of the old regions, and it splits k old regions if and only if it hits the 
previous lines in k — 1 different places. Two lines can intersect in at most one 
point. Therefore the new line can intersect the n— 1 old lines in at most n — 1 
different points, and we must have k sj n. We have established the upper 
bound 

L n ^ L, x _i + n , for n > 0. 

Furthermore it’s easy to show by induction that we can achieve equality in 
this formula. We simply place the nth line in such a way that it’s not parallel 
to any of the others (hence it intersects them all), and such that it doesn’t go 
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through any of the existing intersection points (hence it intersects them all 
in different places). The recurrence is therefore 

t-o = l; , , 

(1.4) 

L n = L n _ ] +n, for n > 0. 

The known values of Li , L 2 , and L 3 check perfectly here, so we’ll buy this. 

Now we need a closed-form solution. We could play the guessing game 
again, but 1 , 2, 4, 7, 1 1 , 1 6, . . . doesn’t look familiar; so let’s try another 
tack. We can often understand a recurrence by “unfolding” or “unwinding” 
it all the way to the end, as follows: 


L n = L n _i + n 

= 1-n— 2 + (n — 1 ) + n 
= L n _3 + (n - 2 ) + (n - 1 ) + n 


Unfolding? 

I’d call this 
“plugging in.” 


= L 0 + 1 + 2-1 |- (n — 2 ) + (n. — 1 ) + n 

= 1 + S n , where S n = 1 + 2 + 3 + • • ■ + (n — 1 ) + n. 

In other words, L n is one more than the sum S n of the first n positive integers. 

The quantity S n pops up now and again, so it’s worth making a table of 
small values. Then we might recognize such numbers more easily when we 
see them the next time: 


rt 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

S n 

1 

3 

6 

10 

15 

21 

28 

36 

45 

55 

66 

78 

91 

105 


These values are also called the triangular numbers, because S n is the num- 
ber of bowling pins in an n-row triangular array. For example, the usual 
four-row array *.y has S 4 = 10 pins. 

To evaluate S n we can use a trick that Gauss reportedly came up with 
in 1786, when he was nine years old [ 88 ] (see also Euler [114, part 1, §415]): 

S n = 1 + 2 + 3 +••• + (n — 1 ) + n 

+ S n = n + (n — 1 ) + (n — 2) + • • • + 2 + 1 

2S n = (n + 1 ) + (n + 1 ) + (n + 1 ) + • • • + (n + 1 ) + (n + 1 ) 


It seems a lot of 
stuff is attributed 
to Gauss — 
either he was really 
smart or he had a 
great press agent. 


We merely add S n to its reversal, so that each of the n columns on the right 
sums to n + 1 . Simplifying, 


S 


n 


n(n+ 1 ) 


(i-5) 


Maybe he just 
had a magnetic 
personality. 


2 


for n > 0 . 
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Actually Gauss is 
often called the 
greatest mathe- 
matician of all time. 
So it’s nice to be 
able to understand 
at least one of his 
discoveries. 


When in doubt, 
look at the words. 
Why is it “closed," 
as opposed to 
“open"? What 
image does it bring 
to mind? 

Answer: The equa- 
tion is “closed,” not 
defined in terms of 
itself — not leading 
to recurrence. The 
case is “closed” — it 
won’t happen again. 
Metaphors are the 
key. 


Is “zig” a technical 
term? 


OK, we have our solution: 


L n 


n(n + 1) + 1 


for n > 0. 


(i-6) 


As experts, we might be satisfied with this derivation and consider it 
a proof, even though we waved our hands a bit when doing the unfolding 
and reflecting. But students of mathematics should be able to meet stricter 
standards; so it’s a good idea to construct a rigorous proof by induction. The 
key induction step is 


L n = L n _! + n = (l(n— l)n + 1) +n = jn(n+ 1) + 1 . 


Now there can be no doubt about the closed form (i.6). 

Incidentally we’ve been talking about “closed forms” without explic- 
itly saying what we mean. Usually it’s pretty clear. Recurrences like (1.1) 
and (1.4) are not in closed form — they express a quantity in terms of itself; 
but solutions like (1.2) and (1.6) are. Sums like 1 +2 + ■ • • + u are not in 
closed form — they cheat by using ‘ but expressions like n(n + 1 )/2 are. 
We could give a rough definition like this: An expression for a quantity f(n) 
is in closed form if we can compute it using at most a fixed number of “well 
known” standard operations, independent of n. For example, 2 n — 1 and 
n(u + 1 )/2 are closed forms because they involve only addition, subtraction, 
multiplication, division, and exponentiation, in explicit ways. 

The total number of simple closed forms is limited, and there are recur- 
rences that don’t have simple closed forms. When such recurrences turn out 
to be important, because they arise repeatedly, we add new operations to our 
repertoire; this can greatly extend the range of problems solvable in “simple” 
closed form. For example, the product of the first n integers, n!, has proved 
to be so important that we now consider it a basic operation. The formula 
‘n!’ is therefore in closed form, although its equivalent ‘1 - 2- . . .-n’ is not. 

And now, briefly, a variation of the lines-in-the-plane problem: Suppose 
that instead of straight lines we use bent lines, each containing one “zig.” 
What is the maximum number Z n of regions determined by u such bent lines 
in the plane? We might expect Z n to be about twice as big as L n , or maybe 
three times as big. Let’s see: 
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Prom these small cases, and after a little thought, we realize that a bent 
line is like two straight lines except that regions merge when the “two” lines 
don’t extend past their intersection point. 



Regions 2, 3, and 4, which would be distinct with two lines, become a single 
region when there’s a bent line; we lose two regions. However, if we arrange 
things properly — the zig point must lie “beyond” the intersections with the 
other lines — that’s all we lose; that is, we lose only two regions per line. Thus 

Z n = L 2n -2n = 2n(2n+l)/2 + l — 2u 

= 2n 2 — n + 1 , for n ^ 0. (1.7) 

Comparing the closed forms ( 1 . 6 ) and ( 1 . 7 ), we find that for large n, 

L n ~ 

Z n ~ 2n 2 ; 

so we get about four times as many regions with bent lines as with straight 
lines. (In later chapters we’ll be discussing how to analyze the approximate 
behavior of integer functions when n is large. The symbol is defined in 
Section 9.1.) 

1.3 THE JOSEPHUS PROBLEM 

Our final introductory example is a variant of an ancient problem 
named for Flavius Josephus, a famous historian of the first century. Legend 
has it that Josephus wouldn’t have lived to become famous without his math- 
ematical talents. During the Jewish-Roman war, he was among a band of 41 
Jewish rebels trapped in a cave by the Romans. Preferring suicide to capture, 
the rebels decided to form a circle and, proceeding around it, to kill every 
third remaining person until no one was left. But Josephus, along with an 
unindicted co-conspirator, wanted none of this suicide nonsense; so he quickly 
calculated where he and his friend should stand in the vicious circle. 

In our variation, we start with n people numbered 1 to n around a circle, 
and we eliminate every second remaining person until only one survives. For 


. . . and a little 
afterthought. . . 


Exercise 18 has the 
details. 


(Ahrens [5, voi. 2] 
and Herstein 
and Kaplansky [187] 
discuss the interest- 
ing history of this 
problem. Josephus 
himself [197] is a bit 
vague.) 


. . . thereby saving 
his tale for us to 
hear. 
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Here’s a case where 
n = 0 makes no 
sense. 


Even so, a bad 
guess isn’t a waste 
of time, because it 
gets us involved in 
the problem. 


This is the tricky 
part: We have 

1(2 n) = 

new™mber(J(n)) , 
where 

newnumber ( k) = 
2k- 1 . 


example, here’s the starting configuration for n = 1 0: 



The elimination order is 2, 4, 6, 8, 1 0, 3, 7, 1 , 9, so 5 survives. The problem: 
Determine the survivor’s number, J(n). 

We just saw that J(10) =5. We might conjecture that J(n) = n/2 when 
n is even; and the case n = 2 supports the conjecture: J (2) = 1. But a few 
other small cases dissuade us — the conjecture fails for n = 4 and n = 6. 


n 

1 

2 3 4 5 6 

J(n) 

1 

13 13 5 


It’s back to the drawing board; let’s try to make a better guess. Hmmm . . . 
J(n) always seems to be odd. And in fact, there’s a good reason for this: The 
first trip around the circle eliminates all the even numbers. Furthermore, if 
n itself is an even number, we arrive at a situation similar to what we began 
with, except that there are only half as many people, and their numbers have 
changed. 

So let’s suppose that we have 2u people originally. After the first go- 
round, we’re left with 



5 

7 


and 3 will be the next to go. This is just like starting out with n people, except 
that each person’s number has been doubled and decreased by 1 . That is, 

J(2n) = 2 J (rt) — 1 , forn^l. 

We can now go quickly to large n. For example, we know that J(10) =5, so 
J (20) = 2J(10) — 1 = 2-5-1 = 9. 

Similarly 1(40) = 17, and we can deduce that J(5-2 m ) = 2 m+1 + 1. 
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But what about the odd case? With 2n + 1 people, it turns out that 
person number 1 is wiped out just after person number 2n, and we’re left with 



Odd case? Hey, 
leave my brother 
out of it. 


Again we almost have the original situation with n people, but this time their 
numbers are doubled and increased by 1 . Thus 

J(2n+1) = 2J(n) + l, forn^l. 

Combining these equations with J ( 1 ) = 1 gives us a recurrence that defines f 
in all cases: 

JO) = 1 ; 

J(2n) = 2J(n) — 1 , forn^l; (i.8) 

J(2n + 1 ) = 2J(n) + 1 , fornj>l. 

Instead of getting J(n) from J(n — 1 ), this recurrence is much more “efficient,” 
because it reduces n by a factor of 2 or more each time it’s applied. We could 
compute J( 1000000), say, with only 19 applications of (i.8). But still, we seek 
a closed form, because that will be even quicker and more informative. After 
all, this is a matter of life or death. 

Our recurrence makes it possible to build a table of small values very 
quickly. Perhaps we’ll be able to spot a pattern and guess the answer. 


n 

1 

2 3 

4 5 6 7 

8 9 10 11 12 13 14 15 

16 

J(n) 

1 

1 3 

13 5 7 

135 7 9 11 13 15 

1 


Voila! It seems we can group by powers of 2 (marked by vertical lines in 
the table); J(n) is always 1 at the beginning of a group and it increases by 2 
within a group. So if we write n in the form n = 2 m + l, where 2 m is the 
largest power of 2 not exceeding n and where l is what’s left, the solution to 
our recurrence seems to be 

J(2 m + l) = 21 + 1 , for m 0 and 0 < l < 2 m . (1.9) 

(Notice that if 2 m -g| n < 2 m+1 , the remainder l = n — 2 m satisfies 0 sj l < 

2TTV+1 2 m = 2 m ) 

We must now prove (1.9). As in the past we use induction, but this time 
the induction is on m. When m = 0 we must have 1 = 0; thus the basis of 
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But there’s a sim- 
pler way! The 
key fact is that 
J(2 m ) = 1 for 
ail m, and this 
follows immedi- 
ately from our first 
equation, 

J(2n)=2J(n)-1. 

Hence we know that 
the first person will 
survive whenever 
n is a power of 2 . 
And in the gen- 
eral case, when 
n = 2 m + 1 , 
the number of 
people is reduced 
to a power of 2 
after there have 
been 1 executions. 
The first remaining 
person at this point, 
the survivor, is 
number 21 + 1 . 


( 1 . 9 ) reduces to J(1) = 1, which is true. The induction step has two parts, 
depending on whether l is even or odd. If m > 0 and 2 m + l = In, then l is 
even and 

J(2 m + 1) = 2J(2 m ~ 1 + 1/2) - 1 = 2(21/2 + 1 ) - 1 = 21 + 1 , 

by (i. 8 ) and the induction hypothesis; this is exactly what we want. A similar 
proof works in the odd case, when 2 m + 1 = 2n + 1 . We might also note that 
(i. 8 ) implies the relation 

J(2n+1)-J(2n) = 2. 

Either way, the induction is complete and ( 1 . 9 ) is established. 

To illustrate solution ( 1 . 9 ), let’s compute J ( 1 00) . In this case we have 
100 = 2 6 + 36, so J(100) =2-36 + 1 = 73. 

Now that we’ve done the hard stuff (solved the problem) we seek the 
soft: Every solution to a problem can be generalized so that it applies to a 
wider class of problems. Once we’ve learned a technique, it’s instructive to 
look at it closely and see how far we can go with it. Hence, for the rest of this 
section, we will examine the solution ( 1 . 9 ) and explore some generalizations 
of the recurrence ( 1 . 8 ). These explorations will uncover the structure that 
underlies all such problems. 

Powers of 2 played an important role in our finding the solution, so it’s 
natural to look at the radix 2 representations of n and J(n). Suppose n’s 
binary expansion is 

n = (b m b m _i . . .bi boh i 

that is, 

tv = b m 2 m + b m _i 2 m 1 + • • • + b ]2 + bo , 

where each bi is either 0 or 1 and where the leading bit b m is 1. Recalling 
that n = 2 m + l, we have, successively, 

tv = (1 b m _! b m _ 2 • • -bi b 0 h , 

l = ( 0 b m _! b m _ 2 ...bi boh, 

21 = (b m _i b m _ 2 • • -bi b 0 0) 2 , 

21+1 = (b m _i b m _2 . . . bi b 0 1 h , 

J(tv) = (b m _i b m _ 2 • • -bi b 0 b m ) 2 . 

(The last step follows because J(n) =21+1 and because b m = 1.) We have 
proved that 


i---biboh) — (b m -i . ■ .bi bo b m h , 


( 1 . 10 ) 
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that is, in the lingo of computer programming, we get J (n) from n by doing 
a one-bit cyclic shift left! Magic. For example, if u = 100 = (1100100)2 then 
J(n) = J((1100100) 2 ) = (1001001)2, which is 64 + 8 + 1 = 73. If we had been 
working all along in binary notation, we probably would have spotted this 
pattern immediately. 

If we start with n and iterate the J function m + 1 times, we’re doing 
m + 1 one-bit cyclic shifts; so, since n is an (m+l)-bit number, we might 
expect to end up with n again. But this doesn’t quite work. For instance 
if n = 13 we have J (( 1 101 ) 2 ) = (1011)2, but then j((101 1 ) 2 ) = (111)2 and 
the process breaks down; the 0 disappears when it becomes the leading bit. 
In fact, J(n) must always be n by definition, since J(n) is the survivor’s 
number; hence if I(n) < n we can never get back up to u by continuing to 
iterate. 

Repeated application of J produces a sequence of decreasing values that 
eventually reach a “fixed point,” where J (n) = n. The cyclic shift property 
makes it easy to see what that fixed point will be: Iterating the function 
enough times will always produce a pattern of all 1 ’s whose value is 2 v * n - 1 — 1 , 
where v(n) is the number of 1 bits in the binary representation of n. Thus, 
since v(13) =3, we have 


(“Iteration” here 
means applying a 
function to itself. ) 


2 or more J’s 

J0UI(13)...)) = 2 3 -l = 7 ; 


similarly 

8 or more 

nn^J((ioiiononoioii) 2 )...)) = 2 10 -i = 1023. 

Curious, but true. 

Let’s return briefly to our first guess, that J(n) = n/2 when n is even. 
This is obviously not true in general, but we can now determine exactly when 
it is true: 


Curiously enough, 
if M is a compact 
C°° n -manifold 
(n > 1 ), there 
exists a differen- 
tiable immersion of 
M into R 2 ^(N 
but not necessarily 
into . 

I wonder if Jose- 
phus was secretly 
a topologist? 


J (n) = n/2, 

21+1 = (2 m + l)/2, 

l = j(2 m — 2) . 


If this number l = -j (2 m —2) is an integer, then n = 2 m + 1 will be a solution, 
because 1 will be less than 2 m . It’s not hard to verify that 2 m — 2 is a multiple 
of 3 when m is odd, but not when m is even. (We will study such things in 
Chapter 4.) Therefore there are infinitely many solutions to the equation 
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Looks like Greek 
to me. 


J(n) = n/2, beginning as follows: 


m l n = 2 m + 1 

1 0 2 

3 2 10 

5 10 42 

7 42 170 


J(n) =21+1 = n/2 n (binary) 

1 10 

5 1010 

21 101010 

85 10101010 


Notice the pattern in the rightmost column. These are the binary numbers 
for which cyclic-shifting one place left produces the same result as ordinary- 
shifting one place right (halving). 

OK, we understand the J function pretty well; the next step is to general- 
ize it. What would have happened if our problem had produced a recurrence 
that was something like (i.8), but with different constants? Then we might 
not have been lucky enough to guess the solution, because the solution might 
have been really weird. Let’s investigate this by introducing constants a, (3, 
and y and trying to find a closed form for the more general recurrence 


f(l) = a; 

f (2n) = 2f (n) + (3 , forn(>1; ( 1 . 11 ) 

f (2n + 1 ) = 2f (n) + y , for u (> 1 . 

(Our original recurrence had a = 1, |3 = — 1, and y = 1.) Starting with 

f(l) = a and working our way up, we can construct the following general 

table for small values of n: 


n 

f(n) 

1 

a 

2 

2a 

+ 

(3 



3 

2a 



+ 

y 

4 

4a 

+ 

3(3 



5 

4a 

+ 

2(3 

+ 

y 

6 

4a 

+ 

(3 

+ 

2y 

7 

4a 



+ 

3y 

8 

8a 

+ 

7(3 



9 

8a 

+ 

6(3 

+ 

y 


It seems that a’s coefficient is n’s largest power of 2. Furthermore, between 
powers of 2, |3’s coefficient decreases by 1 down to 0 and y’s increases by 1 
up from 0. Therefore if we express f(n) in the form 




f(n) = A(n) a + B(n) |3 + C(n)y , 



14 RECURRENT PROBLEMS 


by separating out its dependence on oc, ( 3 , and y, it seems that 

A(n) = 2 m ; 

B(n) = 2 m — 1 — L ; (1.14) 

C(n) = l. 

Here, as usual, n = 2 m + 1 and 0 ^ l < 2 m , for n ^ 1 . 

It’s not terribly hard to prove (1.13) and (1.14) by induction, but the 
calculations are messy and uninformative. Fortunately there’s a better way 
to proceed, by choosing particular values and then combining them. Let’s 
illustrate this by considering the special case a = 1 , |3 = y = 0 , when f (n) is 
supposed to be equal to Afn): Recurrence (1.11) becomes 

A( 1 ) = 1 ; 

A( 2 n) = 2 A(n) , forn^l; 

A( 2 n+ 1 ) = 2 A(n) , forn^l. 

Sure enough, it’s true (by induction on m) that A( 2 m + 1 ) = 2 m . 

Next, let’s use recurrence (1.11) and solution (1.13) in reverse , by start- 
ing with a simple function f(n) and seeing if there are any constants (oc, | 3 ,y) 
that will define it. Plugging the constant function f (n) = 1 into (1.11) says that 

1 = oc; 

1 = 2-1 + ( 3 ; 

1 = 2-1+y; 

hence the values (a, ( 3 , y ) = (1,— 1,— 1) satisfying these equations will yield 
A(n) — B(n) — Cfn) = f(n) = 1 . Similarly, we can plug in f (n) = n: 

1 = oc; 

2 n = 2 -n + ( 3 ; 

2n + 1 = 2 • n + y; 

These equations hold for all n when oc = 1 , ( 3 = 0 , and y = 1 , so we don’t 
need to prove by induction that these parameters will yield f(n) = n. We 
already know that ffn) = n will be the solution in such a case, because the 
recurrence (1.11) uniquely defines f(n) for every value of n. 

And now we’re essentially done! We have shown that the functions Afn), 
Bfn), and C(n) of (1.13), which solve (1.11) in general, satisfy the equations 

Afn) = 2 m , 

Afn) — Bfn) — Cfn) = 1 ; 

Afn) + Cfn) = n . 


Hold onto your 
hats, this next part 
is new stuff. 


A neat idea! 


where n = 2 m + l and 0 ^ l < 2 m ; 
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Beware: The au- 
thors are expecting 
us to figure out 
the idea of the 
repertoire method 
from seat-of-the- 
pants examples, 
instead of giving 
us a top-down 
presentation. The 
method works best 
with recurrences 
that are “linear,” 
in the sense that 
the solutions can be 
expressed as a sum 
of arbitrary param- 
eters multiplied by 
functions of n, as 
in ( 1 . 13 ). Equation 
( 1 . 13 ) is the key. 


(‘relax’ = ‘destroy’) 


I think I get it: 

The binary repre- 
sentations of A(n) , 
B(n) , and C(n) 
have 1 ’s in different 
positions. 


Our conjectures in (1.14) follow immediately, since we can solve these equa- 
tions to get C(n) = n — A(n) = l and B(n) = A(n) — 1 — C(n) = 2 m — 1 — l. 

This approach illustrates a surprisingly useful repertoire method for solv- 
ing recurrences. First we find settings of general parameters for which we 
know the solution; this gives us a repertoire of special cases that we can solve. 
Then we obtain the general case by combining the special cases. We need as 
many independent special solutions as there are independent parameters (in 
this case three, for a, (3 , and y). Exercises 16 and 20 provide further examples 
of the repertoire approach. 

We know that the original J-recurrence has a magical solution, in binary: 

J((b m b m -i . . .bi boh) = (b m _i . ..b] b 0 b m ) 2 , where b m = 1. 

Does the generalized Josephus recurrence admit of such magic? 

Sure, why not? We can rewrite the generalized recurrence (1.11) as 

f (1 ) = a; 

f(2n + j) = 2f(n) + (3,- , for j = 0, 1 and n^l, 
if we let |3o = (3 and |3i = y. And this recurrence unfolds, binary-wise: 

f((b m b m _! ...hi boh) = 2f((b m b m _ 1 ...b^) + |3 bo 

= 4f((b m b m _! . . . b 2 ) 2 ) +2|3 b , + |3 bo 

= 2 m f ((b m ) 2 )+2 m 1 Pb m _, + ■ • • +2(3 b , +|3b 0 
= 2 m a + 2 m 1 |3b m _, + • • • + 2|3 b , + (3 b 0 • 

Suppose we now relax the radix 2 notation to allow arbitrary digits instead 
of just 0 and 1 . The derivation above tells us that 

f((b m b m _-| ...bi b 0 ) 2 ) = (a(3 bm _, |3 bm _ 2 ...|3 b , (3 bo )2- (i-i6) 

Nice. We would have seen this pattern earlier if we had written (1.12) in 
another way: 




n 

f(n) 

1 

oc 

2 

2 a + |3 

3 

2a + y 

4 

4a -\- 2(3 J- [3 

5 

4a + 2(3 + y 

6 

4 a + 2y + |3 

7 

4a + 2y + y 
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For example, when n = 100 = ( 1100100 ) 2 , our original Josephus values 
a = 1 , (3 = — 1 , and y = 1 yield 

n = (1 1 0 0 1 0 0 ) 2 = 100 

f(n) = ( 1 1-1-1 1 —1 —1 ) 2 

= +64 +32 -16 -8 +4 -2 -1 = 73 

as before. The cyclic-shift property follows because each block of binary digits 
(10 ... 0 0)2 in the representation of n is transformed into 

( 1-1 ... -1 - 1) 2 = (00 ... 01 ) 2 . 

So our change of notation has given us the compact solution ( 1 . 16 ) to the 
general recurrence ( 1 . 15 ). If we’re really uninhibited we can now generalize 
even more. The recurrence 

f(j) = c x jt for 1 ^ j < d; 

f(dn + j) = cf(n) + (3j , for 0 ^ j < d and n^l, 

is the same as the previous one except that we start with numbers in radix d 
and produce values in radix c. That is, it has the radix-changing solution 

f ((bm bm — 1 • ■ • b] bo)d) = (o+> m |3b m _! |3b m _ 2 • ■ • 3bi Pb 0 )c- (l.l 8 ) 

For example, suppose that by some stroke of luck we’re given the recurrence 

f(l) = 34, 
f (2) = 5, 

f(3n) = 10f(n) + 76, forn^l, 

f(3n+l) = 10f(n)— 2, forn^l, 

f (3n + 2) = 1 Of (n) + 8 , forn.^1, 

and suppose we want to compute f ( 1 9). Here we have d = 3 and c = 1 0. Now 
19 = ( 201 ) 3 , and the radix-changing solution tells us to perform a digit-by- 
digit replacement from radix 3 to radix 1 0. So the leading 2 becomes a 5, and 
the 0 and 1 become 76 and —2, giving 

f(19) = f((201) 3 ) = (5 76 —2) 10 = 1258, 

which is our answer. 

Thus Josephus and the Jewish-Roman war have led us to some interesting 
general recurrences. 


“There are two 
kinds of general- 
izations. One is 
cheap and the other 
is valuable. 

It is easy to gen- 
eralize by diluting 
a little idea with a 
big terminology. 

It is much more 
difficult to pre- 
pare a refined and 
condensed extract 
from several good 
ingredients. ’’ 

— G. Polya [297] 


Perhaps this was a 
stroke of bad luck. 


But in general I’m 
against recurrences 
of war. 



Please do all the 
warmups in all the 
chapters! 

— The Mgm ’t 
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Exercises 

Warmups 

1 All horses axe the same color; we can prove this by induction on the 
number of horses in a given set. Here’s how: “If there’s just one horse 
then it’s the same color as itself, so the basis is trivial. For the induction 
step, assume that there are n horses numbered 1 to n. By the induc- 
tion hypothesis, horses 1 through n— 1 axe the same color, and similarly 
horses 2 through n axe the same color. But the middle horses, 2 through 
n — 1 , can’t change color when they’re in different groups; these are 
horses, not chameleons. So horses 1 and n must be the same color as 
well, by transitivity. Thus all n horses are the same color; QED.” What, 
if anything, is wrong with this reasoning? 

2 Find the shortest sequence of moves that transfers a tower of n disks 
from the left peg A to the right peg B, if direct moves between A and B 
are disallowed. (Each move must be to or from the middle peg. As usual, 
a larger disk must never appear above a smaller one.) 

3 Show that, in the process of transferring a tower under the restrictions of 
the preceding exercise, we will actually encounter every properly stacked 
arrangement of n disks on three pegs. 

4 Are there any starting and ending configurations of n disks on three pegs 
that are more than 2 n — 1 moves apart, under Lucas’s original rules? 

5 A “Venn diagram” with three overlapping circles is often used to illustrate 
the eight possible subsets associated with three given sets: 



Can the sixteen possibilities that arise with four given sets be illustrated 
by four overlapping circles? 

6 Some of the regions defined by n lines in the plane are infinite, while 
others are bounded. What’s the maximum possible number of bounded 
regions? 

7 Let H(n) = J(n + 1) — J(n). Equation (i.8) tells us that H(2n) = 2, and 
H(2n+1) = J(2n+2)— J(2n+1) = (2J(n+l)-l)-(2J(n)+1) = 2H(n)-2, 
for all n 1 . Therefore it seems possible to prove that H(n) = 2 for all n, 
by induction on n. What’s wrong here? 
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Homework exercises 

8 Solve the recurrence 

Qo = or, Qi = |3 ; 

Qn = (1 + Qn-l)/Qn- 2 , for TL > 1 . 

Assume that Q n ^ 0 for all 0. Hint: Q 4 = (1 + ct)/(3. 

9 Sometimes it’s possible to use induction backwards, proving things from 
n to n - 1 instead of vice versa! For example, consider the statement 

P(n) : M.-.Xn < ^ Xl + n +Xn ^) > if xi , . . . ,x n ^ 0. 

This is true when n = 2, since (xj + X 2) 2 — 4 xjX 2 = (xi — X 2) 2 ^ 0. 

a By setting x n = (xi + • • • + x n _i)/(n — 1), prove that P(n) im- 
plies P(n — 1 ) whenever n > 1 . 
b Show that P(n) and P(2) imply P(2n). 
c Explain why this implies the truth of P(n) for all n. 

10 Let Q n be the minimum number of moves needed to transfer a tower of 
n disks from A to B if all moves must be clockwise — that is, from A 
to B, or from B to the other peg, or from the other peg to A. Also let R n 
be the minimum number of moves needed to go from B back to A under 
this restriction. Prove that 

n = f 0, if n = 0 ; ( 0 , if n = 0 ; 

[ 2 R n _i + 1 , if n > 0 ; n \Q n + Qn-i+1, if n > 0. 

(You need not solve these recurrences; we’ll see how to do that in Chap- 
ter 7.) 

11 A Double Tower of Hanoi contains 2n disks of n different sizes, two of 
each size. As usual, we’re required to move only one disk at a time, 
without putting a larger one over a smaller one. 

a How many moves does it take to transfer a double tower from one 
peg to another, if disks of equal size are indistinguishable from each 
other? 

b What if we are required to reproduce the original top-to-bottom 
order of all the equal-size disks in the final arrangement? [Hint: 
This is difficult — it’s really a “bonus problem.”] 

12 Let’s generalize exercise 11a even further, by assuming that there are 
n different sizes of disks and exactly m.k disks of size k. Determine 
A(mi , . . . , m n ), the minimum number of moves needed to transfer a 
tower when equal-size disks are considered to be indistinguishable. 


. . . now that’s a 
horse of a different 
color. 
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Good luck keep- 
ing the cheese in 
position. 


13 What’s the maximum number of regions definable by n zig-zag lines, 


ZZ 2 = 12 

each of which consists of two parallel infinite half-lines joined by a straight 
segment? 

14 How many pieces of cheese can you obtain from a single thick piece by 
making five straight slices? (The cheese must stay in its original position 
while you do all the cutting, and each slice must correspond to a plane 
in 3D.) Find a recurrence relation for P n , the maximum number of three- 
dimensional regions that can be defined by n different planes. 

15 Josephus had a friend who was saved by getting into the next-to-last 
position. What is I(n), the number of the penultimate survivor when 
every second person is executed? 

16 Use the repertoire method to solve the general four-parameter recurrence 

g(l) = a; 

g(2n + j) = 3g(n) + yn + (3j , for j = 0, 1 and n. 1 . 

Hint: Try the function g(n) = n. 

Exam problems 

17 If W n is the minimum number of moves needed to transfer a tower of n 
disks from one peg to another when there are four pegs instead of three, 
show that 



Is this like a 
five-star general 
recurrence? 


W n(n+1 )/ 2 ^ 2W n(n _ 1) / 2 + T n , for n > 0. 

(Here T n = 2 n — 1 is the ordinary three-peg number.) Use this to find a 
closed form f(n) such that W n ( n+ i)/2 ^ f(n.) f° r all n S? 0. 

18 Show that the following set of n bent lines defines Z n regions, where Z n 
is defined in ( 1 . 7 ): The jth bent line, for 1 ^ j sC n, has its zig at (n 2 ! , 0) 
and goes up through the points (n 2 ’ — rO , 1 ) and (n 2 ’ — nJ — n~ n , 1 ). 

19 Is it possible to obtain Z n regions with n bent lines when the angle at 
each zig is 30° ? 

20 Use the repertoire method to solve the general five-parameter recurrence 

h(l) = a; 

h(2n + j) = 4h(n) + yjn + |3j , for j = 0, 1 and n 1. 
Hint: Try the functions h(n) = n and h(n) = n 2 . 
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21 Suppose there are 2 n people in a circle; the first n are “good guys” 
and the last n are “bad guys.” Show that there is always an integer m 
(depending on n) such that, if we go around the circle executing every 
rath person, all the bad guys are first to go. (For example, when n = 3 
we can take m = 5; when n = 4we can take ra = 30.) 

Bonus problems 

22 Show that it’s possible to construct a Venn diagram for all 2 n possible 
subsets of n given sets, using n convex polygons that are congruent to 
each other and rotated about a common center. 

23 Suppose that Josephus finds himself in a given position j, but he has a 
chance to name the elimination parameter q such that every qth person 
is executed. Can he always save himself? 

Research problems 

24 Find all recurrence relations of the form 

^ 1 + a iX n _i + • • ■ + Qi c X n _ k 

bi X n _i + • • • + bkX n _k 
whose solution is periodic. 

25 Solve infinitely many cases of the four-peg Tower of Hanoi problem by 
proving that equality holds in the relation of exercise 17. 

26 Generalizing exercise 23, let’s say that a Josephus subset of {1 , 2, . . . , n} 
is a set of k numbers such that, for some q, the people with the other n— k 
numbers will be eliminated first. (These are the k positions of the “good 
guys” Josephus wants to save.) It turns out that when n = 9, three of the 
2 9 possible subsets are non-Josephus, namely {1,2, 5, 8, 9}, {2, 3, 4, 5, 8}, 
and {2,5, 6, 7, 8}. There are 13 non- Josephus sets when n = 12, none for 
any other values of n ^ 12. Are non- Josephus subsets rare for large n? 


Yes, and well done 
if you find them. 



A term is how long 
this course lasts. 


I 



Sums 


SUMS ARE EVERYWHERE in mathematics, so we need basic tools to handle 
them. This chapter develops the notation and general techniques that make 
summation user-friendly. 

2.1 NOTATION 

In Chapter 1 we encountered the sum of the first n integers, which 
we wrote out as 1 + 2 + 3 + ■ • • + (n — 1 ) + n. The ‘ ’ in such formulas tells 

us to complete the pattern established by the surrounding terms. Of course 
we have to watch out for sums like 1 + 7 + ■ • • + 41 .7, which are meaningless 
without a mitigating context. On the other hand, the inclusion of terms like 
3 and (n — 1 ) was a bit of overkill; the pattern would presumably have been 
clear if we had written simply 1 + 2 + • • • + n. Sometimes we might even be 
so bold as to write just 1 + • ■ ■ + n. 

We’ll be working with sums of the general form 

Q! +a 2 + h a n , (2.1) 

where each ai< is a number that has been defined somehow. This notation has 
the advantage that we can “see” the whole sum, almost as if it were written 
out in full, if we have a good enough imagination. 

Each element of a sum is called a term. The terms are often specified 
implicitly as formulas that follow a readily perceived pattern, and in such cases 
we must sometimes write them in an expanded form so that the meaning is 
clear. For example, if 

1 + 2 + ---+ 2 n - 1 

is supposed to denote a sum of n terms, not of 2 n_1 , we should write it more 
explicitly as 

2° + 2 1 + • • • + 2 n ~ 1 . 


21 
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The three-dots notation has many uses, but it can be ambiguous and a 
bit long-winded. Other alternatives are available, notably the delimited form 

n 

y dk , (2.2) 

k=l 

which is called Sigma-notation because it uses the Greek letter ^ (upper- 
case sigma). This notation tells us to include in the sum precisely those 
terms whose index k is an integer that lies between the lower and upper 
limits 1 and n, inclusive. In words, we “sum over k, from 1 to n.” Joseph 
Fourier introduced this delimited ^-notation in 1820, and it soon took the 
mathematical world by storm. 

Incidentally, the quantity after ]T (here a^) is called the summand. 

The index variable k is said to be bound to the sign in ( 2 . 2 ), because 
the k in is unrelated to appearances of k outside the Sigma-notation. Any 
other letter could be substituted for k here without changing the meaning of 
( 2 . 2 ). The letter i is often used (perhaps because it stands for “index”), but 
we’ll generally sum on k since it’s wise to keep i for v / ~ T. 

It turns out that a generalized Sigma-notation is even more useful than 
the delimited form: We simply write one or more conditions under the ]T, 
to specify the set of indices over which summation should take place. For 
example, the sums in ( 2 . 1 ) and ( 2 . 2 ) can also be written as 


“Le signe in- 

dique que Von doit 
donner au nombre 
entier i toutes ses 
valeurs 1 , 2, 3, 

. . . , et prendre la 
somme des termes.” 

— J. Fourier [127] 


Weil, I wouldn’t 
want to use a or n 
as the index vari- 
able instead of k in 
( 2 . 2 ); those letters 
are “free variables” 
that do have mean- 
ing outside the 
here. 


y_ ak . (2.3) 

1 sCksCn 

In this particular example there isn’t much difference between the new form 
and ( 2 . 2 ), but the general form allows us to take sums over index sets that 
aren’t restricted to consecutive integers. For example, we can express the sum 
of the squares of all odd positive integers below 100 as follows: 

L * 2 - 

1^k<100 
k odd 


The delimited equivalent of this sum, 

49 

y_ (2k + 1 ) 2 , 

k=0 


is more cumbersome and less clear. Similarly, the sum of reciprocals of all 
prime numbers between 1 and N is 



2.1 NOTATION 23 


the delimited form would require us to write 


7t(N ) 


1 


k=i Flc 


where Pk denotes the kth prime and 7 t(N) is the number of primes ^ N. 
(Incidentally, this sum gives the approximate average number of distinct 
prime factors of a random integer near N, since about 1/p of those inte- 
gers are divisible by p. Its value for large N is approximately lnlnN + M, 
where M « 0.2614972128476427837554268386086958590515666 is Mertens’s 
constant [271]; lnx stands for the natural logarithm of x, and lnlnx stands 
for ln(lnx).) 

The biggest advantage of general Sigma-notation is that we can manip- 

The summation ulate it more easily than the delimited form. For example, suppose we want 

symbol looks like t 0 change the index variable k to k + 1 . With the general form, we have 

a distorted pacman. 

Y Q k = Y ak+i ; 

IsCk^n 1^k+1^n 

it’s easy to see what’s going on, and we can do the substitution almost without 
thinking. But with the delimited form, we have 

n n— 1 

Y a k = Y ak+i ; 

k=1 k— 0 


A tidy sum. 


That’s nothing. 

You should see how 
many times S ap- 
pears in The Iliad. 


it’s harder to see what’s happened, and we’re more likely to make a mistake. 

On the other hand, the delimited form isn’t completely useless. It’s 
nice and tidy, and we can write it quickly because ( 2 . 2 ) has seven symbols 
compared with ( 2 . 3 ) ’s eight. Therefore we’ll often use Y with upper and 
lower delimiters when we state a problem or present a result, but we’ll prefer 
to work with relations-under-]T when we’re manipulating a sum whose index 
variables need to be transformed. 

The Y. sign occurs more than 1000 times in this book, so we should be 
sure that we know exactly what it means. Formally, we write 

Y ak ( 2 ‘4) 

p(k) 


as an abbreviation for the sum of all terms ak such that k is an integer 
satisfying a given property P(k). (A “property P(k)” is any statement about 
k that can be either true or false.) For the time being, we’ll assume that 
only finitely many integers k satisfying P(k) have ak ^ 0; otherwise infinitely 
many nonzero numbers are being added together, and things can get a bit 
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tricky. At the other extreme, if P(k) is false for all integers k, we have an 
“empty” sum; the value of an empty sum is defined to be zero. 

A slightly modified form of (2.4) is used when a sum appears within the 
text of a paragraph rather than in a displayed equation: We write ‘X^p(k) a k\ 
attaching property P(k) as a subscript of ]T, so that the formula won’t stick 
out too much. Similarly, ‘ 2 Ik = i a k’ is a convenient alternative to (2.2) when 
we want to confine the notation to a single line. 

People are often tempted to write 

n— 1 n 

^k(k— 1)(n — k) instead of Y k(k— 1)(n — k) 

k=2 k— 0 


because the terms for k = 0 , 1 , and n in this sum are zero. Somehow it 
seems more efficient to add up n -2 terms instead of n + 1 terms. But such 
temptations should be resisted; efficiency of computation is not the same as 
efficiency of understanding! We will find it advantageous to keep upper and 
lower bounds on an index of summation as simple as possible, because sums 
can be manipulated much more easily when the bounds are simple. Indeed, 
the form X.k=2 can even be dangerously ambiguous, because its meaning is 
not at all clear when n = 0 or n = 1 (see exercise 1 ). Zero- valued terms cause 
no harm, and they often save a lot of trouble. 

So far the notations we’ve been discussing are quite standard, but now 
we are about to make a radical departure from tradition. Kenneth E. Iverson 
introduced a wonderful idea in his programming language APL [ 191 , page 11 ; 
see also 220], and we’ll see that it greatly simplifies many of the things we 
want to do in this book. The idea is simply to enclose a true-or-false statement 
in brackets, and to say that the result is 1 if the statement is true, 0 if the 
statement is false. For example, 

, . , [ 1 , if p is a prime number; 

[ p prune] = j if p is not a prime number . 

Iverson’s convention allows us to express sums with no constraints whatever 
on the index of summation, because we can rewrite (2.4) in the form 

Y Q-k |~P(k)] • (2.5) 

k 


Hey: The “Kro- 
necker delta” that 
I’ve seen in other 
books (I mean 
5kn , which is 1 if 
k = n, 0 oth- 
erwise) is just a 
special case of 
Iverson’s conven- 
tion: We can write 
[k = n] instead. 


If P(k) is false, the term Qk[P(k)] is zero, so we can safely include it among 
the terms being summed. This makes it easy to manipulate the index of 
summation, because we don’t have to fuss with boundary conditions. 

A slight technicality needs to be mentioned: Sometimes a.k isn’t defined 
for all integers k. We get around this difficulty by assuming that [P(k)] is 
“very strongly zero” when P(k) is false; it’s so much zero, it makes Qk[Pfk)] 


“I am often surprised 
by new, important 
applications [of this 
notation].” 

— B. de Finetti [123] 
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equal to zero even when is undefined. For example, if we use Iverson’s 
convention to write the sum of reciprocal primes ^ N as 

Y tP prime] [p^N]/p , 

p 


. . . and it’s less 
likely to lose points 
on an exam for 
“lack of rigor" 


there’s no problem of division by zero when p = 0, because our convention 
tells us that [0 prime] [0 ^ N]/0 = 0. 

Let’s sum up what we’ve discussed so far about sums. There are two 
good ways to express a sum of terms: One way uses ‘ • • • ’, the other uses 
1 Y. ’• The three-dots form often suggests useful manipulations, particularly 
the combination of adjacent terms, since we might be able to spot a simplifying 
pattern if we let the whole sum hang out before our eyes. But too much detail 
can also be overwhelming. Sigma- notation is compact, impressive to family 
and friends, and often suggestive of manipulations that are not obvious in 
three-dots form. When we work with Sigma-notation, zero terms are not 
generally harmful; in fact, zeros often make ^-manipulation easier. 


2.2 SUMS AND RECURRENCES 

OK, we understand now how to express sums with fancy notation. 
But how does a person actually go about finding the value of a sum? One way 
is to observe that there’s an intimate relation between sums and recurrences. 
The sum 

n 

S n = Y Tv 

k=0 


(Think of S n as is equivalent to the recurrence 
not just a single 

number, but as a So = do ' 

sequence defined for 

all n ^ 0 .) S n = S n -i+a n , for n > 0. 


(2.6) 


Therefore we can evaluate sums in closed form by using the methods we 
learned in Chapter 1 to solve recurrences in closed form. 

For example, if a n is equal to a constant plus a multiple of n, the sum- 
recurrence (2.6) takes the following general form: 


Ro = a; 

R n = R n -i + (3 + yn , for n > 0. 


Proceeding as in Chapter 1, we find Ri = a + (3 + y, R2 = a + 2(3 + 3y, and 
so on; in general the solution can be written in the form 


R n = A(n) <x + B(n) |3 + C(n)y , 


(2.8) 
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where A(n), B(n), and C(n) are the coefficients of dependence on the general 
parameters a, (3, and y. 

The repertoire method tells us to try plugging in simple functions of n 
for R n , hoping to find constant parameters a, (3 , and y where the solution is 
especially simple. Setting R n = 1 implies a = 1 , |3 = 0, y = 0; hence 

A(n) = 1 . 

Setting R n = n implies a = 0, |3 = 1 , y = 0; hence 
B(n) = n. 

Setting R n = u 2 implies a = 0, (3 = — 1 , y =2; hence 

2C(n)-B(n) = n 2 

and we have C(n) = (n 2 + n)/2. Easy as pie. Actually easier; n = 

Therefore if we wish to evaluate ( 4 n+iH 4 n+ 3 ) 

n 

^(a + bk) , 
k=0 

the sum-recurrence ( 2 . 6 ) boils down to ( 2 . 7 ) with a = (3 = a, y = b, and the 
answer is aA(n) + aB(n) + bC(n) = a(n + 1 ) + b(n + 1 )n/2. 

Conversely, many recurrences can be reduced to sums; therefore the spe- 
cial methods for evaluating sums that we’ll be learning later in this chapter 
will help us solve recurrences that might otherwise be difficult. The Tower of 
Hanoi recurrence is a case in point: 

To = 0; 

T n = 2T n _i + 1 , for n > 0. 

It can be put into the special form ( 2 . 6 ) if we divide both sides by 2 n : 

T 0 /2° = 0; 

T n /2 n = T n _i /2 n - ] + 1 /2 n , for n > 0. 

Now we can set S n = T n /2 n , and we have 
So ' 0 ; 

S n = Sn — 1 + 2~ n , for n > 0. 


It follows that 
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(Notice that we’ve left the term for k = 0 out of this sum.) The sum of the 
geometric series 2 _1 +2~ 2 + - ■ -+2~ n = {jV +[j) 2 + - ■ will be derived 

later in this chapter; it turns out to be 1 — (j) 11 . Hence T n = 2 n S n = 2 n — 1. 

We have converted T n to S n in this derivation by noticing that the re- 
currence could be divided by 2 n . This trick is a special case of a general 
technique that can reduce virtually any recurrence of the form 

a n T n = b n T n _i + c n (^* 9 ) 

to a sum. The idea is to multiply both sides by a summation factor, s n : 

Sn&nTn = S n b n T n _i + S n C n . 

This factor s n is cleverly chosen to make 

Snbn — Sn — 1 Clrt— ] . 


Then if we write S n = s n a n T n we have a sum-recurrence, 


Sn = s 


n — 1 


-j- S n C r 


Hence 

n n 

S n = soaoTo + SkCk = sibiTp + y s^Ck , 

k=1 k=1 


and the solution to the original recurrence ( 2 . 9 ) is 


Tn = 



( 2 . 10 ) 


(The value of si 
cancels out, so it 
can be anything 
but zero.) 


For example, when n = 1 we get Ti = (si biTo + si Ci )/si ai = (bi To +Ci )/ai . 

But how can we be clever enough to find the right s n ? No problem: The 
relation s n = s n -i a n -i /b n can be unfolded to tell us that the fraction 


q n _t a n 2 ■ • ■ Qi 
bnbn-i • ■ - b 2 


( 2 . 11 ) 


or any convenient constant multiple of this value, will be a suitable summation 
factor. For example, the Tower of Hanoi recurrence has a n = 1 and b n = 2; 
the general method we’ve just derived says that s n = 2 ~ n is a good thing to 
multiply by, if we want to reduce the recurrence to a sum. We don’t need a 
brilliant flash of inspiration to discover this multiplier. 

We must be careful, as always, not to divide by zero. The summation- 
factor method works whenever all the a’s and all the b’s are nonzero. 
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Let’s apply these ideas to a recurrence that arises in the study of “quick- 
sort,” one of the most important methods for sorting data inside a computer. 
The average number of comparison steps made by quicksort when it is applied 
to n items in random order satisfies the recurrence 


(Quicksort was 
invented by Hoare 
in 1962 [189].) 


Co — 0 ; 


C n — Tl + 1 + 


2 

TT 


n — 1 


Y_ C k ’ 


for n > 0. 


(2.12) 


Hmmm. This looks much scarier than the recurrences we’ve seen before; it 
includes a sum over all previous values, and a division by n. Trying small 
cases gives us some data (Ci = 2 , C2 = 5 , C3 = but doesn’t do anything 
to quell our fears. 

We can, however, reduce the complexity of (2.12) systematically, by first 
getting rid of the division and then getting rid of the ]T sign. The idea is to 
multiply both sides by n, obtaining the relation 

n— 1 

nC n = n 2 + n + 2 ^ Ck , for n > 0; 

k=0 


hence, if we replace n by n — 1 , 

n— 2 

(n-1)C n _i = (n— I) 2 + (n— 1) +2 ^ C k , forri-1>0. 

k=0 

We can now subtract the second equation from the first, and the Yi sign 
disappears: 


nC n — (n — 1 )C n _i =2n + 2C n _i, forn>1. 


It turns out that this relation also holds when n = 1 , because Ci = 2 . 
Therefore the original recurrence for C n reduces to a much simpler one: 

Co = 0 ; 

nC n = (n + 1 )C n _i + 2 n , for n > 0 . 

Progress. We’re now in a position to apply a summation factor, since this 
recurrence has the form of (2.9) with a n = n, b n = n + 1 , and c n = 2 n. 
The general method described on the preceding page tells us to multiply the 
recurrence through by some multiple of 

_ an-i Qn-2 Qi _ (n— 1) • (n — 2) • ... ■ 1 _ 2 

Sn b n b n _i . . . b 2 (n + 1) • n • . . . • 3 (n+l)n' 
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We started with a 
Y_ in the recur- 
rence, and worked 
hard to get rid of it. 
But then after ap- 
plying a summation 
factor, we came up 
with another Y ■ 
Are sums good, or 
bad, or what? 


But your spelling is 
alwrong. 


The solution, according to (2.10), is therefore 


C r 


2(n + l)£ 

k=1 


1 

k + 1 ' 


The sum that remains is very similar to a quantity that arises frequently 
in applications. It arises so often, in fact, that we give it a special name and 
a special notation: 


H r 


- 1 + 2 +' 


1 

H — 
n 


n 


Z 


1 

k ’ 


(2-13) 


The letter H stands for “harmonic”; H n is a harmonic number, so called 
because the kth harmonic produced by a violin string is the fundamental 
tone produced by a string that is 1 /k times as long. 

We can complete our study of the quicksort recurrence (2.12) by putting 
C n into closed form; this will be possible if we can express C n in terms of 
H n . The sum in our formula for C n is 


n 


z 


1 

kTT 


z 

IsCk^n 


1 

kTT' 


We can relate this to H n without much difficulty by changing k to k — 1 and 
revising the boundary conditions: 



Alright! We have found the sum needed to complete the solution to (2.12): 
The average number of comparisons made by quicksort when it is applied to 
n randomly ordered items of data is 


C n = 2(n + 1 )H n — 2n . 


(2.14) 


As usual, we check that small cases are correct: Co = 0, Ci =2, C2 — 5 . 
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2.3 MANIPULATION OF SUMS 

The key to success with sums is an ability to change one ]T into 
another that is simpler or closer to some goal. And it’s easy to do this by 
learning a few basic rules of transformation and by practicing their use. 

Let K be any finite set of integers. Sums over the elements of K can be 
transformed by using three simple rules: 


Y ca k 

k<EK 

= C^Oki 
keK 

(distributive law) 

i 2 -^) 

Y_ ( Qk + ^k) 

k<EK 

= a k + bk ; 

keK keK 

(associative law) 

( 2 . 16 ) 

Y a k 

keK 

= X a p(k) • 

p(k)eK 

(commutative law) 

( 2 . 17 ) 


The distributive law allows us to move constants in and out of a The 
associative law allows us to break a )T into two parts, or to combine two ]T’s 
into one. The commutative law says that we can reorder the terms in any way 
we please; here p(k) is any permutation of the set of all integers. For example, 
if K = {— 1 , 0, +1} and if p(k) = — k, these three laws tell us respectively that 

ca_i + cao + cai = c(a_i + Qo + ai ) ; (distributive law) 

(a_i + b_-| ) + (a 0 + b 0 ) + (ai +bi) 

= (a_i + do + ai ) + (b_i + bo + bi ) ; (associative law) 
a_i + ao + aj = ai + ao + a_i . (commutative law) 

Gauss’s trick in Chapter 1 can be viewed as an application of these three 
basic laws. Suppose we want to compute the general sum of an arithmetic 
progression , 

S = ^ (a + bk) . 

O^k^n 


By the commutative law we can replace k by n — k, obtaining 
S = ^ (a + b(n — k)) = ^ (a + bn — bk). 

0$n-ksCn O^k^n 

These two equations can be added by using the associative law: 

2S = ^ ((a + bk) + (a + bn — bk)) = ^ (2a + bn) . 

0^k$n O^ksCn 


Not to be confused 
with finance. 


Why not call it 
permutative instead 
of commutative? 


This is something 
like changing vari- 
ables inside an 
integral, but easier. 
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“What’s one 
and one and one 
and one and one 
and one and one 
and one and one 
and one?" 

“I don’t know,” 
said Alice. 

“I lost count." 

“She can’t do 
Addition.” 

— Lewis Carroll [SO] 


Additional, eh? 


And we can now apply the distributive law and evaluate a trivial sum: 

2 S = (2a + bn) Y_ 1 = (2a + bn)(n + 1 ) . 

Dividing by 2, we have proved that 

n 

^(a + bk) = (a + ^bri)(n + 1 ) . (2.18) 

k=0 

The right-hand side can be remembered as the average of the first and last 
terms, namely \ (a + (a + bn)) , times the number of terms, namely (n + 1 ). 

It’s important to bear in mind that the function p(k) in the general 
commutative law (2.17) is supposed to be a permutation of all the integers. In 
other words, for every integer n there should be exactly one integer k such that 
p(k) = n. Otherwise the commutative law might fail; exercise 3 illustrates 
this with a vengeance. Transformations like p(k) = k + c or p(k) = c — k, 
where c is an integer constant, are always permutations, so they always work. 

On the other hand, we can relax the permutation restriction a little bit: 
We need to require only that there be exactly one integer k with p(k) = n 
when n is an element of the index set K. If n ^ K (that is, if n is not in K), 
it doesn’t matter how often p(k) = n occurs, because such k don’t take part 
in the sum. Thus, for example, we can argue that 

Y ^ = Y Q 2k = Y a2k ’ ( 2 ' 1 9) 

k<EK n<EK 2k<EK 2k<EK 

k even n even 2k even 

since there’s exactly one k such that 2k = n when n G K and n is even. 

Iverson’s convention, which allows us to obtain the values 0 or 1 from 
logical statements in the middle of a formula, can be used together with the 
distributive, associative, and commutative laws to deduce additional proper- 
ties of sums. For example, here is an important rule for combining different 
sets of indices: If K and K' are any sets of integers, then 

Y ak + Y Q ic = Y Q k + Y Qk * (2.20) 

keK k<EK' keKnK' keKUK' 

This follows from the general formulas 

Y a k = Y ak ^ e ^ ( 2 * 21 ) 

k<EK k 


and 


[kG K] + [kG K'] = [ke Kf~l K'] + [keKuK'l . 


(2.22) 
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Typically we use rule (2.20) either to combine two almost-disjoint index sets, 
as in 


m n n 

X_ Q k + X = a m + a k , for 1 ^ m ^ n; 

k=1 k=m k=1 

or to split off a single term from a sum, as in 


Y_ a k = a 0 + a k , for n ^ 0. (2.23) 

OsCksCn IsCk^n 

This operation of splitting off a term is the basis of a perturbation 
method that often allows us to evaluate a sum in closed form. The idea 
is to start with an unknown sum and call it S n : 


(The two sides of 
(2.20) have been 
switched here.) 


S n — ) Q k • 

0$k^n 

(Name and conquer.) Then we rewrite S n+ i in two ways, by splitting off both 
its last term and its first term: 


Sn T Qn+1 


y Qk 

0$k$n+1 


Q 0 + die 

1 ^k$n+1 

Q 0 + a k+ i 

1$k+1$n+l 

QO + Y_ dk+1 • 

O^k^n 


(2.24) 


Now we can work on this last sum and try to express it in terms of S n - If we 
succeed, we obtain an equation whose solution is the sum we seek. 

For example, let’s use this approach to find the sum of a general geomet- 
ric progression , 

S n = ax k . 

OsCk^n 

The general perturbation scheme in (2.24) tells us that 
S n + ax n+1 = ax° + Y_ ax k+1 , 

O^ksCn 


and the sum on the right is X/L 0<k<n ax k = xS n by the distributive law. 
Therefore S n + ax n+1 = a + xS n , and we can solve for S n to obtain 


y_ ax k 

k=0 


a — ax 


n+1 


(2-25) 


If it’s geometric, 
there should be a 
geometric proof. 



1 -X 


for x / 1. 
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Ah yes, this formula 
was drilled into me 
in high school. 


(When x = 1, the sum is of course simply (n + 1)a.) The right-hand side 
can be remembered as the first term included in the sum minus the first term 
excluded (the term after the last), divided by 1 minus the term ratio. 

That was almost too easy. Let’s try the perturbation technique on a 
slightly more difficult sum, 

s n = X! k2k - 

O^k^rv 

In this case we have So = 0, Si = 2, S 2 = 10, S 3 = 34, S 4 = 98; what is the 
general formula? According to ( 2 . 24 ) we have 

S n + (n + 1 )2 n+1 = (^+1)2 k+1 ; 

O^k^n 

so we want to express the right-hand sum in terms of S n . Well, we can break 
it into two sums with the help of the associative law, 

Y_ L 2 k+1 + Y_ 2 k+1 - 

OsCk^n O^k^n 

and the first of the remaining sums is 2S n . The other sum is a geometric 
progression, which equals (2 — 2 n+2 )/(1 — 2) = 2 n+2 —2 by ( 2 . 25 ). Therefore 
we have S n + (n + 1 )2 n+1 = 2S n + 2 n+2 — 2, and algebra yields 

Y_ L 2 k = (n — 1 )2 n+1 + 2 . 

O^k^n 

Now we understand why S 3 = 34: It’s 32 + 2, not 2-17. 

A similar derivation with x in place of 2 would have given us the equation 
S n + (n + I )x n+1 = xS n + (x — x n+2 )/(I — x); hence we can deduce that 


^kx k = 


k=0 


x- (n+ 1 )x n+1 + nx n+2 


for x I . 


( 2 . 26 ) 


It’s interesting to note that we could have derived this closed form in a 
completely different way, by using elementary techniques of differential cal- 
culus. If we start with the equation 




1 — X 


n+1 


k=0 


I -X 


and take the derivative of both sides with respect to x, we get 

A k _j ( 1 — x)(— (n+ 1 )x n ) + 1 — x n+1 1 — (n +1 )x n + nx n+1 

y icx == 

k=0 


(l-x ) 2 


(1 - x ) 2 
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because the derivative of a sum is the sum of the derivatives of its terms. We 
will see many more connections between calculus and discrete mathematics 
in later chapters. 

2.4 MULTIPLE SUMS 

The terms of a sum might be specified by two or more indices, not 
just by one. For example, here’s a double sum of nine terms, governed by two 
indices ) and k: 

Y a i b k = aib! + aib 2 + aib 3 
1$j,k^3 _|_ a2 ] 0^ + 0-2^2 + a 2 b 3 

+ a 3 b! + a 3 b 2 + a 3 b 3 . 

We use the same notations and methods for such sums as we do for sums with 
a single index. Thus, if P(j,k) is a property of j and k, the sum of all terms 
Uj k such that P(j,k) is true can be written in two ways, one of which uses 
Iverson’s convention and sums over all pairs of integers j and k: 

Y Q i.k = Y a i’ k [ p (j’ k )] • 

P(j,k) j,k 

Only one YL sign is needed, although there is more than one index of sum- 
mation; Yi denotes a sum over all combinations of indices that apply. 

We also have occasion to use two ^L’s, when we’re talking about a sum 
of sums. For example, 

YY Q >' k [ p (j- k )] 

j k 

is an abbreviation for 

a b k [ p (iU]) - 

j k k ' 

which is the sum, over all integers j, of Oj ^ [P(j,k)], the latter being the 
sum over all integers k of all terms Qj ^ for which P(j, k) is true. In such cases 
we say that the double sum is “summed first on k.” A sum that depends on 
more than one index can be summed first on any one of its indices. 

In this regard we have a basic law called interchanging the order of 
summation, which generalizes the associative law ( 2 . 16 ) we saw earlier: 

LL Q j,k[P(j>k)] = L ^ k = II a j,k [P(j,k)] . 

j k P(j ,k) k j 


Oh no, a nine-term 
governor. 

Notice that this 
doesn’t mean to 
sum over all j ^ 1 
and all k ;< 3 . 


Multiple S’s are 
evaluated right to 
left (inside-out). 


( 2 . 27 ) 
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Who’s panicking? 
I think this rule 
is fairly obvious 
compared to some 
of the stuff in 
Chapter 1. 


The middle term of this law is a sum over two indices. On the left, 21 j 21 k 
stands for summing first on k, then on j. On the right, 21 k Hj stands for 
summing first on j, then on k. In practice when we want to evaluate a double 
sum in closed form, it’s usually easier to sum it first on one index rather than 
on the other; we get to choose whichever is more convenient. 

Sums of sums are no reason to panic, but they can appear confusing to 
a beginner, so let’s do some more examples. The nine-term sum we began 
with provides a good illustration of the manipulation of double sums, because 
that sum can actually be simplified, and the simplification process is typical 
of what we can do with 21 21 ’ s: 


Y Q ) b ic 

1 ^j,k ^3 


= Y_ Qjb k [1 ^ j,k<:3] = ^k^3] 

j,k 

= LL djb k [1 sS j 3][1 ^k^3] 

j k 

= ^Qj[l^j^ 3 ]^b k [l^k^ 3 ] 
j k 

= Y_ Q j[1 ^ j ^3] (Y_ bkH ^k^3] 

j k k ' 

= ( Y_ a jL1 ^ 1 ^3]^j b k [! ^k^3] 


x j ' x k 

= (l>)(l> 

v j = 1 7 v k=l 


The first line here denotes a sum of nine terms in no particular order. The 
second line groups them in threes, (dibi + aib2 + 0^3) + (a2bi + a2b2 + 
a 2 b 3 ) + (a3bi + a 3 b2 + ci3b3). The third line uses the distributive law to 
factor out the a’s, since dj and [1 3 ] do not depend on k; this gives 

ai (bi + b2 + b 3 ) + d2(bi + b 3 + b 3 ) + d3(bi + b2 + b 3 ). The fourth line is 

the same as the third, but with a redundant pair of parentheses thrown in 
so that the fifth line won’t look so mysterious. The fifth line factors out the 
(bi + b2 +b 3 ) that occurs for each value of j: (di + 02 + d 3 )(bi + b2 + b 3 ). 
The last line is just another way to write the previous line. This method of 
derivation can be used to prove a general distributive law, 

Y_ djbk = f Y. a >) ( Y- bk ) ’ ( 2 - 28 ) 

jej ' jel ' ^keK ' 

keK 


valid for all sets of indices J and K. 

The basic law (2.27) for interchanging the order of summation has many 
variations, which arise when we want to restrict the ranges of the indices 
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instead of summing over all integers j and k. These variations come in two 
flavors, vanilla and rocky road. First, the vanilla version: 

= Y a iT = ( 2 - 2 9) 

jej keK jej keK jej 

keK 

This is just another way to write (2.27), since the Iversonian [jej, kGK] 
factors into [j € J] [k G K] . The vanilla-flavored law applies whenever the ranges 
of j and k are independent of each other. 

The rocky-road formula for interchange is a little trickier. It applies when 
the range of an inner sum depends on the index variable of the outer sum: 

Y Y a b k = Y Y a L k - ( 2 -3°) 

jej keK(j) keK' jej'(k) 

Here the sets J, K(j), K', and J'(k) must be related in such a way that 

[j G J] [kG K(j)] = [kGK'][jGj'(k)] . 

A factorization like this is always possible in principle, because we can let 
J = K' be the set of all integers and K(j) = J'(k) be the basic property P(j, k) 
that governs a double sum. But there are important special cases where the 
sets J, K ( j ) , K', and J'(k) have a simple form. These arise frequently in 
applications. For example, here’s a particularly useful factorization: 

[1 G] j G] ti] [ j ^ k G] n] = fl^j^k^n] = [1 ^k^n][1 ^ j ^k] . (2.31) 

This Iversonian equation allows us to write 

n n n k 

YY a i* = Y a b k = X.H a k k - ( 2 -3 2 ) 

j = l k=j l$j^k§n k=l j = l 


One of these two sums of sums is usually easier to evaluate than the other; 
we can use (2.32) to switch from the hard one to the easy one. 

Let’s apply these ideas to a useful example. Consider the array 


‘ aiai 

ai 0.2 

Qi a 3 

... ai a n " 

a 2 ai 

Cl 2 d 2 

a 2 a 3 

. . . a 2 a n 

a 3 ai 

a 3 a 2 

a 3 a 3 

... a 3 a n 

-QnQl 

a n a 2 

a n a 3 

p 

p 

1 


(Now is a good 
time to do warmup 
exercises 4 and 6 .) 

(Or to check out 
the Snickers bar 
languishing in the 
freezer.) 


of n 2 products QjQk- Our goal will be to find a simple formula for 
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Does rocky road 
have fudge in it? 


the sum of all elements on or above the main diagonal of this array. Because 
djCik = a^Qj, the array is symmetrical about its main diagonal; therefore 
will be approximately half the sum of all the elements (except for a fudge 
factor that takes account of the main diagonal). 

Such considerations motivate the following manipulations. We have 


S\1 = a j Q k = Y a k a i = Y Q i Q k = 


1 ^j^k^n 




l^k^j^n 


because we can rename (j,k) as (k, j). Furthermore, since 

[1 j ^k$;n] + [1 ^k^ j^n] = [1 ^ j,k^n] + [1 ^ j = k^u] , 

we have 

2S^ = Sxi + S^ = Y Q i a k + Y a S ak ' 

1^j,k$n l$j=k$n 

The first sum is (Hj=i a j)(Hk=i a k) = ( 2 L k =i a^) 2 , by the general distribu- 
tive law (2.28). The second sum is )T k= i a£. Therefore we have 



an expression for the upper triangular sum in terms of simpler single sums. 
Encouraged by such success, let’s look at another double sum: 

S = Y ( a k - Qj)(b|c - bj) . 

1 ^j<k$n 


Again we have symmetry when j and k are interchanged: 

s = Y (oij - ctk)(bj -b k ) = Y (ak - Qj)(bk -bj) . 

1^k<j$n 1:gk<j$n 


So we can add S to itself, making use of the identity 

[1 ^ j < k^n] + [1 ^ k< j ^n] = [1 ^ j,k^n] — [1 ^ j =k^n] 

to conclude that 

2 S = Y ( a j — a k )(bj — b k ) - Y (aj - a k )(bj - b k ) . 

ligj.k^n 1^j=k^n 
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The second sum here is zero; what about the first? It expands into four 
separate sums, each of which is vanilla flavored: 


Y Qjbj - Y- a i bk ~ Y akb > + Y akbk 

1^j,k$n 1:gj,k^n 1^j,k$n l$j,k$n 


= 2 Y_ a kbk 

1 ^j.k^n 


2 Y_ a j b k 

Irgj.ksCn 


= 2n Y_ a kbk 

l^k^n 


2 (i>)(i>)- 

v k=l 7 v k=l 7 


In the last step both sums have been simplified according to the general 
distributive law (2.28). If the manipulation of the first sum seems mysterious, 
here it is again in slow motion: 


2 ^ a kbk = 2 21 21 a k b k 

l^j.k^n IsCk^n 

= 2 22 a kbk 21 1 

l$k^n l^j^n 

= 2 22 a kbkn = 2n 22 Q kb k • 

IsCk^n 1^k$n 

An index variable that doesn’t appear in the summand (here j) can simply 
be eliminated if we multiply what’s left by the size of that variable’s index 
set (here n). 

Returning to where we left off, we can now divide everything by 2 and 
rearrange things to obtain an interesting formula: 

22 ak X bk = n 22 Qkbk ~ Y (cik - aj)(bk - bj) . (2.34) 
V=1 7 x k=l 7 k=l 1^j<k«n 


This identity yields Chebyshev’s monotonic inequalities as a special case: 


L ak L bk < n 22 a k b k , if ai ^ ^ a n and bi i. • • • s: b n ; 

k k=i 7 V=i 7 k=i 

(H ak )(H bk ) Js n ^.a k b k , if ai ^ ^ a n and bi 5 ; • • • b n . 

V=i 7 x k=1 7 k=1 

(In general, if ai ^ ^ a n and if p is a permutation of { 1 ,...,n}, it’s 

not difiicult to prove that the largest value of Y . k - 1 a kb p ( k ) occurs when 

bp(i) ^ b p(n ), and the smallest value occurs when bp(i) > • • ^ b p(n) .) 


( Chebyshev [58] 
actually proved the 
analogous result 
for integrals 
instead of sums, 

(Ja f M dx ) 

' (Ja 9( x ) dx ) 
(b-a) 

• (Ja f ( x )g( x ) dx ), 

if f(x) and g(x) 
are monotone 
nondecreasing 
functions.) 
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Multiple summation has an interesting connection with the general op- 
eration of changing the index of summation in single sums. We know by the 
commutative law that 

CLk = Qp(k) > 

k£K p(k)£K 

if p(k) is any permutation of the integers. But what happens when we replace 
k by f(j), where f is an arbitrary function 

f: J -4 K 

that takes an integer ) £ J into an integer f(j) £ K? The general formula for 
index replacement is 

I>f(j) = ^ak#f~(k), (2.35) 

jel keK 

where #f~(k) stands for the number of elements in the set 


f (k) = { j | f(j) = k } , 

that is, the number of values of j £ J such that f(j) equals k. 

It’s easy to prove (2.35) by interchanging the order of summation, 

H a f(j) = X Qk [ f ^ =k ] = H Q kX[ f ^ =k ] > 

jel jej keK je j 

keK 


My other math 
teacher calls this a 
“ bijection maybe 
I’ll learn to love 
that word some day. 

And then again . . . 


since Xoej[f(i) =k ] = ( k )- the special case that f is a one-to-one 

correspondence between J and K, we have #f _ (k) = 1 for all k, and the 
general formula (2.35) reduces to 

I>(i) = H Q f(i) = X. ak - 

je J f(j)eK keK 

This is the commutative law (2.17) we had before, slightly disguised. 

Our examples of multiple sums so far have all involved general terms like 
ak or bk- But this book is supposed to be concrete, so let’s take a look at a 
multiple sum that involves actual numbers: 


Watch out — <, _ v— 1 

the authors n — Z_ ^ _ i 

seem to think that l$j<k^n 

j, k, and n are 

“actual numbers" For example, Si = 0; S2 = 1 ; S3 = + yry + 3^2 = f • 
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The normal way to evaluate a double sum is to sum first on ) or first 
on k, so let’s explore both options. 


s - = L L k H 

1$:k$n 1^j<k 

= L L } 

1 ^k^n 1 $k— j<k 

= L L } 

IsCk^n 0<j^k-l 

= X. Hk -! 

summing first on j 

replacing ) by k — j 

simplifying the bounds on ) 

by (2.13), the definition of H 

l!Ck^n 

= X Hk 

replacing k by k + 1 

1^k+1sCn 

= ^ H k . 

simplifying the bounds on k 


0^k<n 


Alas! We don’t know how to get a sum of harmonic numbers into closed form. 
If we try summing first the other way, we get 


j <k<Cri 7 

L L l 

summing first on k 

replacing k by k + j 

j<k+j^n 

L L l 

simplifying the bounds on k 

0<k^n-j 

Z H n-j 

by (2.13), the definition of H n _ 

Z H S 

replacing j by n — j 

<;n j <^n 

Z H, • 

simplifying the bounds on j 


0^j<Tl 


We’re back at the same impasse. 

But there’s another way to proceed, if we replace k by k + ) before 
deciding to reduce S n to a sum of sums: 


S 


n 


z 

1 ^j<k^n 

z 


1 


Ic-j 

1 

k 


1sCj<k+j$n 


recopying the given sum 


Get out the whip. 


replacing k by k + j 



V/ V/ 
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It’s smart to say 

II 

M 

M 

summing first on j 

k <: n instead of 

1^k$n l^jsCn-k 


k ^ n — 1 here. 

n — k 

/ 1 , 


Simple bounds save 

the sum on j is trivial 

energy. 

z — K 

1 ^k^TL 



■IK' 

1^k$n lsCk^n 

by the associative law 


= n (L j ))- n 

by gosh 


= n.H n — TL . 

by ( 2 . 13 ), the definition of H 


Aha! We’ve found S n . Combining this with the false starts we made gives us 
a further identity as a bonus: 

Y_ H k = nH n -n. ( 2 . 36 ) 

O^kcn 

We can understand the trick that worked here in two ways, one algebraic 
and one geometric. (1) Algebraically, if we have a double sum whose terms in- 
volve k+f(j), where f is an arbitrary function, this example indicates that it’s 
a good idea to try replacing k by k— f(j) and summing on ). (2) Geometrically, 
we can look at this particular sum S n as follows, in the case n — 4: 


3 = 1 
3=2 
3=3 
3=4 


k = 2 k = 3 k = 4 



1 

1 


Our first attempts, summing first on j (by columns) or on k (by rows), gave 
us Hi + H 2 + H 3 = H 3 + H 2 + Hi . The winning idea was essentially to sum 
by diagonals, getting f + f + y- 


2.5 GENERAL METHODS 

Now let’s consolidate what we’ve learned, by looking at a single 
example from several different angles. On the next few pages we’re going to 
try to find a closed form for the sum of the first n squares, which we’ll call D n : 

□ n = Y- kI ' for n ^ 0 . ( 2 . 37 ) 

O^ksCn 

We’ll see that there are at least seven different ways to solve this problem, 
and in the process we’ll learn useful strategies for attacking sums in general. 
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First, as usual, we look at some small cases. 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

u 2 

0 

1 

4 

9 

16 

25 

36 

49 

64 

81 

100 

121 

144 

□n 

0 

1 

5 

14 

30 

55 

91 

140 

204 

285 

385 

506 

650 


No closed form for D n is immediately evident; but when we do find one, we 
can use these values as a check. 


Method 0: You could look it up. 

A problem like the sum of the first n squares has probably been solved 
before, so we can most likely find the solution in a handy reference book. 
Sure enough, page 36 of the CRC Standard Mathematical Tables [28] has the 
answer: 


□ 


n 


n(n+1)(2n+1) 

6 


for n > 0. 


(2-38) 


Just to make sure we haven’t misread it, we check that this formula correctly 
gives Ds = 5-6 - 11/6 = 55. Incidentally, page 36 of the CRC Tables has 
further information about the sums of cubes, . . . , tenth powers. 

The definitive reference for mathematical formulas is the Handbook of 
Mathematical Functions, edited by Abramowitz and Stegun [2]. Pages 813- 
814 of that book list the values of for n ^ 100; and pages 804 and 809 
exhibit formulas equivalent to (2.38), together with the analogous formulas 
for sums of cubes, . . . , fifteenth powers, with or without alternating signs. 

But the best source for answers to questions about sequences is an amaz- 
ing little book called the Handbook of Integer Sequences, by Sloane [330], 
which lists thousands of sequences by their numerical values. If you come 
up with a recurrence that you suspect has already been studied, all you have 
to do is compute enough terms to distinguish your recurrence from other fa- 
mous ones; then chances are you’ll find a pointer to the relevant literature in 
Sloane’s Handbook. For example, 1, 5, 14, 30, ... turns out to be Sloane’s 
sequence number 1574, and it’s called the sequence of “square pyramidal 
numbers” (because there are D n balls in a pyramid that has a square base of 
n 2 balls). Sloane gives three references, one of which is to the handbook of 
Abramowitz and Stegun that we’ve already mentioned. 

Still another way to probe the world’s store of accumulated mathematical 
wisdom is to use a computer program (such as Axiom, MACSYMA, Maple, or 
Mathematica) that provides tools for symbolic manipulation. Such programs 
are indispensable, especially for people who need to deal with large formulas. 

It’s good to be familiar with standard sources of information, because 
they can be extremely helpful. But Method 0 isn’t really consistent with the 
spirit of this book, because we want to know how to figure out the answers 


(Harder sums 
can be found 
in Hansen’s 
comprehensive 
table [178].) 
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Or, at least to 
problems having 
the same answers 
as problems that 
other people have 
decided to consider. 


by ourselves. The look-up method is limited to problems that other people 
have decided are worth considering; a new problem won’t be there. 

Method 1: Guess the answer, prove it by induction. 

Perhaps a little bird has told us the answer to a problem, or we have 
arrived at a closed form by some other less-than-rigorous means. Then we 
merely have to prove that it is correct. 

We might, for example, have noticed that the values of have rather 
small prime factors, so we may have come up with formula (2.38) as something 
that works for all small values of n. We might also have conjectured the 
equivalent formula 


□ 


n 


n(n+ j)(u + 1 ) 
3 


for n ^ 0, 


(2-39) 


which is nicer because it’s easier to remember. The preponderance of the 
evidence supports (2.39), but we must prove our conjectures beyond all rea- 
sonable doubt. Mathematical induction was invented for this purpose. 

“Well, Your Honor, we know that Dq = 0 = 0(0+ 40 (0 + 1 )/3, so the basis 
is easy. For the induction, suppose that n > 0, and assume that (2.39) holds 
when u is replaced by n — 1 . Since 


Dn — Pin — 1 + T4 , 


we have 


3D n = (n- 1)(n— 2)(n) + 3n 2 
= (n 3 — |n 2 + In) + 3n 2 
= (n 3 + |n 2 + ju) 

= n(n + j)(n + 1 ) . 

Therefore (2.39) indeed holds, beyond a reasonable doubt, for all n + 0.” 
Judge Wapner, in his infinite wisdom, agrees. 

Induction has its place, and it is somewhat more defensible than trying 
to look up the answer. But it’s still not really what we’re seeking. All of 
the other sums we have evaluated so far in this chapter have been conquered 
without induction; we should likewise be able to determine a sum like D n 
from scratch. Flashes of inspiration should not be necessary. We should be 
able to do sums even on our less creative days. 

Method 2: Perturb the sum. 

So let’s go back to the perturbation method that worked so well for the 
geometric progression (2.25). We extract the first and last terms of [U n +i in 
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order to get an equation for m n : 


□ n + (n + 1 ) 2 = Y_ ( k + 1 ) 2 
O^k^n 


Y_ (k 2 + 2k + 1 ) 

O^k^n 

Y_ k 2 + 2^k+ Y_ 1 

O^k^n O^k^ri O^k^n 

□n + 2 y~ k + (n + 1 ) . 
0^k$n 


Oops — the Dn’s cancel each other. Occasionally, despite our best efforts, the 
perturbation method produces something like D n = D n , so we lose. 

On the other hand, this derivation is not a total loss; it does reveal a way 
to sum the first n integers in closed form, 

2 k = (n+l) 2 -(n+1), 

O^k^n 


even though we’d hoped to discover the sum of first integers squared. Could 
it be that if we start with the sum of the integers cubed, which we might 
call ® n , we will get an expression for the integers squared? Let’s try it. 


® n + (n+1) 3 = ( k +!) 3 

O^k^n 


Y_ (k 3 + 3 k 2 + 3 k+ 1 ) 

OsCk<Cn 

® n + 3 m n + 3 — h (n+ 1 ) . 


Sure enough, the ® n ’s cancel, and we have enough information to determine 
□ n without relying on induction: 

3 D n = (n+ I) 3 - 3 (n + l)n /2 — (n+ 1 ) 

= (n+ 1)(n 2 +2ri+ 1 - §n- 1) = (n + 1 )(n + Ijn . 

Method 3 : Build a repertoire. 

A slight generalization of the recurrence (2.7) will also suffice for sum- 
mands involving n 2 . The solution to 


Ro = a; 

R n = R n -i + (3 + yn + 5 n 2 , for n > 0 , 
will be of the general form 

R n = A(n)a+ B(n) |3 + C(n)y + D(n)6 ; 


(2.40) 


(2.41) 


Seems more like a 
draw. 


Method 2 ' : 
Perturb your TA. 


and we have already determined A(n), B(n), and C(n), because (2.40) is the 
same as (2.7) when 6 = 0 . If we now plug in R It = n 3 , we find that n 3 is the 
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The horizontal scale 
here is ten times the 
vertical scale. 


solution when a = 0, |3 = 1 , y = —3, 6 = 3. Hence 

3D(n) — 3C(n) + B(n) = n 3 ; 
this determines D(n). 

We’re interested in the sum □ u , which equals Dn-i + n 2 ; thus we get 
□ n = Rn if we set a = |3 = y = 0 and 6 = 1 in (2.41). Consequently 
□ n = D(n). We needn’t do the algebra to compute D(n) from B(n) and 
C(n), since we already know what the answer will be; but doubters among us 
should be reassured to find that 

3D(n) = n 3 + 3C(n) — B(n) = n J + 3 ^ — n = n(n+l)(n+l). 

Method 4: Replace sums by integrals. 

People who have been raised on calculus instead of discrete mathematics 
tend to be more familiar with J than with Yi, so they find it natural to try 
changing Yi t° J- One of our goals in this book is to become so comfortable 
with Yi that we’ll think J is more difficult than Yi (at least for exact results). 
But still, it’s a good idea to explore the relation between "Y_ and J, since 
summation and integration are based on very similar ideas. 

In calculus, an integral can be regarded as the area under a curve, and we 
can approximate this area by adding up the areas of long, skinny rectangles 
that touch the curve. We can also go the other way if a collection of long, 
skinny rectangles is given: Since Dn is the sum of the areas of rectangles 
whose sizes are 1 x 1, 1 x 4, . . . , 1 x ri 2 , it is approximately equal to the area 
under the curve f (x) = x 2 between 0 and n. 



The area under this curve is J™ x 2 dx = n 3 /3; therefore we know that D n is 
approximately jn 3 . 
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One way to use this fact is to examine the error in the approximation, 
E n = □„ — jtL 3 - Since D n satisfies the recurrence D n = D n _i +n 2 , we find 
that E n satisfies the simpler recurrence 

E n = Dn - yTi 3 = □ n _ 1 +n 2 -ln 3 = E n _! + 5 (n-1 ) 3 + n 2 - ^n 3 

= E, x i + n — j . 

Another way to pursue the integral approach is to find a formula for E n by 
summing the areas of the wedge-shaped error terms. We have 


□n- 


dx = ^ f k 2 — 


k=1 


x 2 dx 


k— 1 


= L * 2 - 


k=1 


.2 k 3 — (k - 1| 3 


= 2>-i) 


k=l 


This is for people 
addicted to calculus. 


Either way, we could find E n and then D n . 

Method 5: Expand and contract. 

Yet another way to discover a closed form for D n is to replace the orig- 
inal sum by a seemingly more complicated double sum that can actually be 
simplified if we massage it properly: 


□ n = Y- k2 = Y k 

l$k^n l^j^k^n 

= Y Y k 

l^jsCn jsCksCn 


= Y (^)(n-j + D 

IsCjsCn V 7 

= 2 Y ( n (n + 1) +j -j 2 ) 

1 ^jsCn 

= jH 2 (n + 1 ) + ^(tl + 1 ) — j Dn = 2 n ( n + 1 )(t4+ 1) - • 

Going from a single sum to a double sum may appear at first to be a backward 
step, but it’s actually progress, because it produces sums that are easier to 
work with. We can’t expect to solve every problem by continually simplifying, 
simplifying, and simplifying: You can’t scale the highest mountain peaks by 
climbing only uphill. 


(The last step here 
is something like 
the last step of 
the perturbation 
method, because 
we get an equation 
with the unknown 
quantity on both 
sides.) 


Method 6: Use Unite calculus. 

Method 7: Use generating functions. 

Stay tuned for still more exciting calculations of D n = Xlk=o k 2 , 
learn further techniques in the next section and in later chapters. 


as we 
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As opposed to a 
cassette function. 


Math power. 


2.6 FINITE AND INFINITE CALCULUS 


We’ve learned a variety of ways to deal with sums directly. Now it’s 
time to acquire a broader perspective, by looking at the problem of summa- 
tion from a higher level. Mathematicians have developed a “finite calculus,” 
analogous to the more traditional infinite calculus, by which it’s possible to 
approach summation in a nice, systematic fashion. 

Infinite calculus is based on the properties of the derivative operator D, 
defined by 


Df(x) 


lim 

h.— >0 


f(x + h) — f(x) 

h 


Finite calculus is based on the properties of the difference operator A, defined 
by 


Af(x) = f (x + 1 ) — f (x) . (2.42) 

This is the finite analog of the derivative in which we restrict ourselves to 
positive integer values of h. Thus, h. = 1 is the closest we can get to the 
“limit” as h — » 0, and Af(x) is the value of (f(x + h) — f(x))/h. when h. = 1. 

The symbols D and A are called operators because they operate on 
functions to give new functions; they are functions of functions that produce 
functions. If f is a suitably smooth function of real numbers to real numbers, 
then Df is also a function from reals to reals. And if f is any real-to-real 
function, so is Af. The values of the functions Df and Af at a point x are 
given by the definitions above. 

Early on in calculus we learn how D operates on the powers f(x) = x m . 
In such cases Df (x) = mx m_1 . We can write this informally with f omitted, 

D(x m ) = mx m -' . 

It would be nice if the A operator would produce an equally elegant result; 
unfortunately it doesn’t. We have, for example, 

A(x 3 ) = (x+ l) 3 -x 3 = 3x 2 + 3x + 1 . 

But there is a type of “mth power” that does transform nicely under A, 
and this is what makes finite calculus interesting. Such newfangled mth 
powers are defined by the rule 

m factors 

x— = x(x — 1 ) . . . (x — m + 1 ) , integer m ^ 0. (2.43) 

Notice the little straight line under the m; this implies that the m factors 
are supposed to go down and down, stepwise. There’s also a corresponding 
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definition where the factors go up and up: 


m factors 

/ ^ s 

x m = x(x + 1 ) . . . (x + m — 1), integer m ^ 0. 


(2.44) 


When m = 0 , we have x- = x° = 1 , because a product of no factors is 
conventionally taken to be 1 (just as a sum of no terms is conventionally 0). 

The quantity x— is called “x to the m falling” if we have to read it 
aloud; similarly, x m is “x to the m rising.” These functions are also called 
falling factorial powers and rising factorial powers , since they are closely 
related to the factorial function n! = n(n — 1 ) . . . ( 1 ). In fact, n! = n— = 1 n . 

Several other notations for factorial powers appear in the mathematical 
literature, notably “Pochhammer’s symbol” (x) m for x m or x— ; notations 
like x (m) or x< m ) are also seen for x— . But the underline/over line convention 
is catching on, because it’s easy to write, easy to remember, and free of 
redundant parentheses. 

Palling powers x— are especially nice with respect to A. We have 

A(x— ) = (x+ 1 )— -x— 

= (x + 1 )x . . . (x — m + 2) — x . . . (x — m + 2)(x — m + 1 ) 

= mx(x — 1 ) . . . (x — m + 2) , 


Mathematical 
terminology is 
sometimes crazy: 
Pochhammer [293] 
actually used the 
notation (x) m 
for the binomial 
coefficient x , not 

m ’ 

for factorial powers. 


hence the finite calculus has a handy law to match D(x m ) = mx m 1 


A(x— ) = mx— . 


( 2 - 45 ) 


This is the basic factorial fact. 

The operator D of infinite calculus has an inverse, the anti-derivative 
(or integration) operator J. The Fundamental Theorem of Calculus relates D 
to J: 


g(x) = Df(x) if and only if 


g(x) dx = f(x) + C . 


Here J g(x) dx, the indefinite integral of g(x), is the class of functions whose 
derivative is g(x). Analogously, A has as an inverse, the anti-difference (or 
summation) operator jTj and there’s another Fundamental Theorem: 

g (x) = Af(x) if and only if ^ g(x) 6x = f(x) + C . (2.46) 

Here Y 9 ( x ) &x, the indefinite sum of g(x), is the class of functions whose 
difference is g(x). (Notice that the lowercase 5 relates to uppercase A as 
d relates to D.) The “C” for indefinite integrals is an arbitrary constant; the 
“C” for indefinite sums is any function p(x) such that p(x + 1 ) = p(x). For 


“Quemadmodum 
ad differentiam 
denotandam usi 
sumus signo A, 
ita summam indi- 
cabimus signo L . 

... ex quo asquatio 
z = Ay , si inver- 
tatur, dabit quoque 
y = Iz+C.” 

— L. Euler [110] 



2.6 FINITE AND INFINITE CALCULUS 49 


You call this a 
punch line? 


example, C might be the periodic function a + b sin 27tx; such functions get 
washed out when we take differences, just as constants get washed out when 
we take derivatives. At integer values of x, the function C is constant. 

Now we’re almost ready for the punch line. Infinite calculus also has 
definite integrals: If g(x) = Df(x), then 


f b 

g(x) dx 

a 



f(b)-f(a). 


Therefore finite calculus — ever mimicking its more famous cousin — has def- 
inite sums: If g(x) = Af(x), then 



f(b)-f(a). 


( 247 ) 


This formula gives a meaning to the notation g( x ) § X) just as the previous 
formula defines g(x) dx. 

But what does g(x) 6 x really mean, intuitively? We’ve defined it by 
analogy, not by necessity. We want the analogy to hold, so that we can easily 
remember the rules of finite calculus; but the notation will be useless if we 
don’t understand its significance. Let’s try to deduce its meaning by looking 
first at some special cases, assuming that g (x) = Af(x) = f(x + 1) — f(x). If 
b = a, we have 


L 


a 

a 


g(x) 6 x 


f(a) — f(a) = 0. 


Next, if b = a + 1 , the result is 


Y g(x) 6 x = f(a+ 1 ) -f(a) = g(a) . 

z a 

More generally, if b increases by 1 , we have 

Y a + g(x) 6 x - ^^g(x) 6 x = (f(b + 1 ) — f(a)) — (f(b) — f(a)) 

= f(b+1)-f(b) = g(b). 


These observations, and mathematical induction, allow us to deduce exactly 
what g(x) 6 x means in general, when a and b are integers with b ^ a: 

b b_1 

y_ g(x) 6 x = y_ g(k) = g(k) , for integers b ^ a. ( 2 . 48 ) 

k=a a^k<b 


In other words, the definite sum is the same as an ordinary sum with limits, 
but excluding the value at the upper limit. 



50 SUMS 


Let’s try to recap this in a slightly different way. Suppose we’ve been 
given an unknown sum that’s supposed to be evaluated in closed form, and 
suppose we can write it in the form X^ Q <k<b 90 <) = La 9 M 6x. The theory 
of finite calculus tells us that we can express the answer as f(b) — f(a), if 
we can only find an indefinite sum or anti-difference function f such that 
g(x) = f(x + 1) — f(x). One way to understand this principle is to write 
Ha<k<b 9(k) out i n fiu.ll, using the three-dots notation: 

y (f(k + 1 ) — f(k)) = (f(a+l ) — f(a)) + (f(a+2) — f(a+1 )) H 

a^k<b 

4- (f (b — 1 ) — f (b — 2)) + (f(b) — f (b — 1 )) . 


Everything on the right-hand side cancels, except f(b) — f(a); so f(b) — f(a) 
is the value of the sum. (Sums of the form X!a<k<bv(k + 1 ) — f (lc) ) are 
often called telescoping, by analogy with a collapsed telescope, because the 
thickness of a collapsed telescope is determined solely by the outer radius of 
the outermost tube and the inner radius of the innermost tube.) 

But rule (2.48) applies only when b a; what happens if b < a? Well, 
(2.47) says that we must have 

Y_ g(x)6x = f(b) — f(a) 

Z — a 

= — (f(a)-f(b)) = -^ b g(x)6x. 

This is analogous to the corresponding equation for definite integration. A 
similar argument proves + ]^b = Y.a> the summation analog of the iden- 
tity Ja + Jb = Ja- In ful1 


> b » c » c 

2_ a g(x)6x + 2_ b g(x)6x = 2 _ a 9 (x) 6 x, (2.49) 

for all integers a, b, and c. 

At this point a few of us are probably starting to wonder what all these 
parallels and analogies buy us. Well for one, definite summation gives us a 
simple way to compute sums of falling powers: The basic laws (2.45), (2.47), 
and (2.48) imply the general law 


y_ 

O^kcn 


k m±l 


n 


m + 1 |o 


n- 


m+1 


m + 1 


for integers m, n 0. (2.50) 


And all this time 
I thought it was 
telescoping because 
it collapsed from a 
very long expression 
to a very short one. 


Others have been 
wondering this for 
some time now. 


This formula is easy to remember because it’s so much like the familiar 
Jj 1 x m dx = n m+1 /('m + 1 ). 
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With friends like 
this . . . 


In particular, when m = 1 we have k4- = k, so the principles of finite 
calculus give us an easy way to remember the fact that 

Y_ k = Y = n(n “ 1)/2 - 

O^kcn 

The definite-sum method also gives us an inkling that sums over the range 
0 ^ k < n often turn out to be simpler than sums over 1 sj k n; the former 
are just f(n) — f (0), while the latter must be evaluated as f(n + 1 ) — f ( 1 ). 

Ordinary powers can also be summed in this new way, if we first express 
them in terms of falling powers. For example, 

k 2 = k- + k- , 

hence 


Y_ ? + ^ = ln(n— 1)(n — 2+|) = ln(n - l)(n - 1 ) . 

O^kcn 

Replacing n by n + 1 gives us yet another way to compute the value of our 
old friend D n = 2Zo<k<n ^ 2 c l° se d form. 

Gee, that was pretty easy. In fact, it was easier than any of the umpteen 
other ways that beat this formula to death in the previous section. So let’s 
try to go up a notch, from squares to cubes: A simple calculation shows that 

k 3 = k- + 3k- + k- . 


(It’s always possible to convert between ordinary powers and factorial powers 
by using Stirling numbers, which we will study in Chapter 6.) Thus 


L 

a^kcb 


k4 j k ! 

T +k - + T 


Falling powers are therefore very nice for sums. But do they have any 
other redeeming features? Must we convert our old friendly ordinary powers 
to falling powers before summing, but then convert back before we can do 
anything else? Well, no, it’s often possible to work directly with factorial 
powers, because they have additional properties. For example, just as we 
have (x + y ) 2 = x 2 + 2xy + y 2 , it turns out that (x + y)- = x- + 2x-y- + y-, 
and the same analogy holds between (x + y) m and (x + y)— . (This “factorial 
binomial theorem” is proved in exercise 5.37.) 

So far we’ve considered only falling powers that have nonnegative expo- 
nents. To extend the analogies with ordinary powers to negative exponents, 
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we need an appropriate definition of x— for m < 0. Looking at the sequence 

x 3 = x(x — 1)(x — 2) , 
x- = x(x — 1 ) , 



we notice that to get from x- to x- to x- to x- we divide by x — 2, then 
by x — 1 , then by x. It seems reasonable (if not imperative) that we should 
divide by x + 1 next, to get from x- to x— , thereby making x— = 1 /(x + 1 ). 
Continuing, the first few negative-exponent falling powers are 

1 

^TT’ 

l 

(x + 1 )(x + 2) ’ 

1 

(x + 1)(x + 2)(x + 3) ’ 



and our general definition for negative falling powers is 


(x + 1)(x + 2) . . . (x + m) 


for m > 0. 


( 2 -5i) 


(It’s also possible to define falling powers for real or even complex m, but we 
will defer that until Chapter 5.) 

With this definition, falling powers have additional nice properties. Per- 
haps the most important is a general law of exponents, analogous to the law 

x m+n _ n 


for ordinary powers. The falling-power version is 


.m+n 


= x— (x — m)— , integers m and u. 


(2.52) 


For example, x^±^ = x- (x — 2)-; and with a negative n we have 


= x- (x — 2)— = x(x - V 


1 


1 


(x-l)x(x+l) x T 1 


= X- 


If we had chosen to define x— as 1/x instead of as l/(x + 1), the law of 
exponents (2.52) would have failed in cases like m = — 1 and n = 1. In fact, 
we could have used (2.52) to tell us exactly how falling powers ought to be 
defined in the case of negative exponents, by setting m = — n. When an 
existing notation is being extended to cover more cases, it’s always best to 
formulate definitions in such a way that general laws continue to hold. 


How can a complex 
number be even? 


Laws have their 
exponents and their 
detractors. 
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Now let’s make sure that the crucial difference property holds for our 
newly defined falling powers. Does Ax— = Tn x m-1 when m < 0? If m = —2, 
for example, the difference is 

1 1 

(x + 2)(x + 3) (x+l)(x + 2) 

(x + 1) — (x + 3) 

(x+1)(x + 2)(x + 3) 

— 2 x— . 



Yes — it works! A similar argument applies for all m < 0. 

Therefore the summation property ( 2 . 50 ) holds for negative falling powers 
as well as positive ones, as long as no division by zero occurs: 



,m+1 


m + 1 


for 


But what about when m = — 1? Recall that for integration we use 


x 1 dx = In x 

.a la 

when m = —1. We’d like to have a finite analog of lnx; in other words, we 
seek a function f(x) such that 


1 

^TT 


Af(x) = f(x+1)-f(x). 


It’s not too hard to see that 


f(x) 


1 1 

T + 2 


+ ••• + 


1 

x 


0.577 exactly? 
Maybe they mean 

1/V3. 

Then again, 
maybe not. 


is such a function, when x is an integer, and this quantity is just the harmonic 
number H x of ( 2 . 13 ). Thus H x is the discrete analog of the continuous lnx. 
(We will define H x for noninteger x in Chapter 6 , but integer values are good 
enough for present purposes. We’ll also see in Chapter 9 that, for large x, the 
value of H x — In x is approximately 0.577 + 1 / (2x) . Hence H x and In x are not 
only analogous, their values usually differ by less than 1 .) 

We can now give a complete description of the sums of falling powers: 



y TTl+l 

m + 1 

b 

H ' 


b 

a 


if m 7 ^ — 1 ; 
if m = — 1 . 


(2.53) 
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This formula indicates why harmonic numbers tend to pop up in the solutions 
to discrete problems like the analysis of quicksort, just as so-called natural 
logarithms arise naturally in the solutions to continuous problems. 

Now that we’ve found an analog for lnx, let’s see if there’s one for e x . 
What function f(x) has the property that Af(x) = f(x), corresponding to the 
identity De x = e x ? Easy: 

f (x + 1 ) — f (x) = f(x) f(x+ 1) = 2f(x) ; 

so we’re dealing with a simple recurrence, and we can take f(x) = 2 X as the 
discrete exponential function. 

The difference of c x is also quite simple, for arbitrary c, namely 

A(c x ) = c x+1 -c x = (c - l)c x . 


Hence the anti-difference of c x is c x /(c — 1 ), if c ^ 1 . This fact, together with 
the fundamental laws (2.47) and (2.48), gives us a tidy way to understand the 
general formula for the sum of a geometric progression: 


a^kcb 



C — 1 


for c 7^ 1 . 


Every time we encounter a function f that might be useful as a closed 
form, we can compute its difference Af = g; then we have a function g whose 
indefinite sum g(x) 6x is known. Table 55 is the beginning of a table of 
difference/ anti-difference pairs useful for summation. 

Despite all the parallels between continuous and discrete math, some 
continuous notions have no discrete analog. For example, the chain rule of 
infinite calculus is a handy rule for the derivative of a function of a function; 
but there’s no corresponding chain rule of finite calculus, because there’s no 
nice form for Af(g(x)j. Discrete change-of- variables is hard, except in certain 
cases like the replacement of x by c ± x. 

However, A(f(x) g(x)) does have a fairly nice form, and it provides us 
with a rule for summation by parts, the finite analog of what infinite calculus 
calls integration by parts. Let’s recall that the formula 


D(uv) = uDv + vDu 


of infinite calculus leads to the rule for integration by parts, 


u Dv = uv — 


v Du , 


‘Table 55’ is on 
page 55. Get it? 
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Table 55 What’s the difference? 

f = ig 

Af = g 

f = lg 

Af = g 

X— = 1 

0 

2 X 

2 X 

X- = X 

1 

c x 

(c — 1 )c x 

x— = x(x 1 ) 

2x 

c x /(c - 1 ) 

c x 

X— 

mx m_1 

cf 

cAf 

x m+1 /(m + 1 ) 

X— 

f + g 

Af + Ag 

H x 

X - 1 = 1/(x • 1} 

fg 

f Ag + EgAf 


after integration and rearranging terms; we can do a similar thing in finite 
calculus. 

We start by applying the difference operator to the product of two func- 
tions u(x) and v(x): 

A(u(x) v(x)) = u(x+1 ) v(x+1 ) — u(x) v(x) 

= u(x+1)v(x+1) — u(x)v(x+1) 

+ u(x) v(x+l ) - u(x) v(x) 

= u(x)Av(x) + v(x+l)Au(x). (2.54) 

This formula can be put into a convenient form using the shift operator E, 
defined by 

Ef(x) = f(x+1). 


Substituting Ev(x) for v(x+l) yields a compact rule for the difference of a 
product: 


A(uv) = uAv + EvAu. 


(2-55) 


Infinite calculus 
avoids E here by 
letting 1 — > 0. 


(The E is a bit of a nuisance, but it makes the equation correct.) Taking 
the indefinite sum on both sides of this equation, and rearranging its terms, 
yields the advertised rule for summation by parts: 


u Av = uv — Ev Au . 


(2-56) 


I guess e x = 2 X , 
for small values 
of 1. 


As with infinite calculus, limits can be placed on all three terms, making the 
indefinite sums definite. 

This rule is useful when the sum on the left is harder to evaluate than the 
one on the right. Let’s look at an example. The function J xe x dx is typically 
integrated by parts; its discrete analog is ]T x2 x 6x, which we encountered 
earlier this chapter in the form ]r™ =0 k2 k . To sum this by parts, we let 
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u(x) = x and Av(x) = 2 X ; hence Au(x) = 1 , v(x) = 2 X , and Ev(x) = 2 X+1 . 

Plugging into (2.56) gives 

^x 2 x 6 x = x 2 x - ^ 2 X+1 6x = x 2 x - 2 X+1 + C. 

And we can use this to evaluate the sum we did before, by attaching limits: 

n 

r , r TLT" 1 

^k2 k = Y_ * 2 x 5 x 

k=0 

n n+1 

= x2 x -2 x+1 

0 

= ((n+ l)2 n+1 — 2 n+2 ) — (0-2° — 2 1 ) = (n - 1 )2 n+1 + 2 . 

It’s easier to find the sum this way than to use the perturbation method, 

because we don’t have to think. The ultimate goal 

We stumbled across a formula for Y Hv earlier in this chapter, of mathematics 

, , , , , „ z-0s:k<n K ’ is to eliminate all 

and counted ourselves lucky. But we could have found our formula (2.36) nee( j f or j n f; e iijg en t 

systematically, if we had known about summation by parts. Let’s demonstrate thought. 

this assertion by tackling a sum that looks even harder, Lo<k<n^- The 

solution is not difficult if we are guided by analogy with J xlnxdx: We take 

u(x) = H x and Av(x) = x = x-, hence Au(x) = x— , v(x) = x-/ 2 , Ev(x) = 

(x + 1 )-/2, and we have 

> xH x 6x = yH x - 2_ j x — 6x 

x 2 1 

= yHx - x Y xl Sx 



(In going from the first line to the second, we’ve combined two falling pow- 
ers ( x — {— 1 )-x— by using the law of exponents (2.52) with m = — 1 and n = 2.) 

Now we can attach limits and conclude that 

2 

Y kH k = xH x 6x = ~2 ( H n - l) • ( 2 -57) 

0$k<n 

2.7 INFINITE SUMS 

When we defined ^-notation at the beginning of this chapter, we 
finessed the question of infinite sums by saying, in essence, “Wait until later. This is finesse? 
For now, we can assume that all the sums we meet have only finitely many 
nonzero terms.” But the time of reckoning has finally arrived; we must face 



Sure: 1+2 + 

4 + 8 + ■ ■ ■ is the 
“infinite precision ” 
representation of 
the number —1 , 
in a binary com- 
puter with infinite 
word size. 
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the fact that sums can be infinite. And the truth is that infinite sums are 
bearers of both good news and bad news. 

First, the bad news: It turns out that the methods we’ve used for manip- 
ulating Y_'s are not always valid when infinite sums are involved. But next, 
the good news: There is a large, easily understood class of infinite sums for 
which all the operations we’ve been performing are perfectly legitimate. The 
reasons underlying both these news items will be clear after we have looked 
more closely at the underlying meaning of summation. 

Everybody knows what a finite sum is: We add up a bunch of terms, one 
by one, until they’ve all been added. But an infinite sum needs to be defined 
more carefully, lest we get into paradoxical situations. 

For example, it seems natural to define things so that the infinite sum 

S — 1+1 + 1 + 1 + J U J )- . . . 

° 2 ' 4 " 8 " 32 " 

is equal to 2, because if we double it we get 

2S = 2+1+2 + 5 + | + yg + -- - = 2 + S. 

On the other hand, this same reasoning suggests that we ought to define 

T = 1 + 2 + 4 + 8 + 16 + 32 + ••• 

to be —1 , for if we double it we get 

2T = 2 + 4 + 8+16 + 32 + 64 + ••• = T-l. 

Something funny is going on; how can we get a negative number by summing 
positive quantities? It seems better to leave T undefined; or perhaps we should 
say that T = oo, since the terms being added in T become larger than any 
fixed, finite number. (Notice that oo is another “solution” to the equation 
2T = T — 1 ; it also “solves” the equation 2S = 2 + S.) 

Let’s try to formulate a good definition for the value of a general sum 
HkeK Q ki where K might be infinite. For starters, let’s assume that all the 
terms are nonnegative. Then a suitable definition is not hard to find: If 
there’s a bounding constant A such that 

Y ^ A 

ker 

for all finite subsets F C K, then we define X!keK ak to the least such A. 

(It follows from well-known properties of the real numbers that the set of 
all such A always contains a smallest element.) But if there’s no bounding 
constant A, we say that XLkeK Q k = oo; this means that if A is any real 
number, there’s a set of finitely many terms whose sum exceeds A. 
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The definition in the previous paragraph has been formulated carefully 
so that it doesn’t depend on any order that might exist in the index set K. 
Therefore the arguments we are about to make will apply to multiple sums 
with many indices ki , k 2 , . . . , not just to sums over the set of integers. 

In the special case that K is the set of nonnegative integers, our definition 
for nonnegative terms <+ implies that 


y dk 

k^O 


n 

lim ) a k . 

n— >oo ^ — 
k =0 


Here’s why: Any nondecreasing sequence of real numbers has a limit (possi- 
bly oo). If the limit is A, and if F is any finite set of nonnegative integers 
whose elements are all ^ n, we have JlkeF ak ^ ^Lk=o Qk ^ A; hence A = oo 
or A is a bounding constant. And if A' is any number less than the stated 
limit A, then there’s an n such that XLk=o Qk > A'; hence the finite set 
F = {0, 1 , . . . , n} witnesses to the fact that A' is not a bounding constant. 

We can now easily compute the value of certain infinite sums, according 
to the definition just given. For example, if (+ = x k , we have 


The set K might 
even be uncount- 
able. But only a 
countable num- 
ber of terms can 
be nonzero, if a 
bounding constant 
A exists, because at 
most nA terms are 
k 1/d. 


L 


k^O 


1 _ v n+1 

lim 

n— >oo I — X 


1 1/(1 — x), if 0 ^ x < 1; 
[ oo, if x /; 1 . 


In particular, the infinite sums S and T considered a minute ago have the re- 
spective values 2 and oo, just as we suspected. Another interesting example is 


L 

k>0 


1 


(k + 1 )(k + 2) 


= Ik= 


k >0 


L , k— 

k— = lim — — 

ri — — 1 


k =0 


= 1 


Now let’s consider the case that the sum might have negative terms as 
well as nonnegative ones. What, for example, should be the value of 


^J-1) k = 1— 1+1— 1+1— 1H ? 

k $:0 

If we group the terms in pairs, we get 

( 1 — 1 ) + ( 1 — 1 ) + ( 1 — 1 ) + * * • = 0 + 0 + 0 + ■ • • , 
so the sum comes out zero; but if we start the pairing one step later, we get 
1 — (1 — 1 ) — (1 — 1 ) — (1 — 1 ) = 1 - 0 - 0-0 ; 

the sum is 1 . 


“Aggregatum quan- 
titatum a — a + 
a — a + a — a 
etc. nunc est = a, 
nunc — 0, adeoque 
continuata in infini- 
tum serie ponendus 
— a/2, fateor 
acumen et veritatem 
animadversionis 
tux." 

— G. Grandi [163] 
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Is this the first page 
with no graffiti? 


We might also try setting x = — 1 in the formula £\ >0 x k = 1/(1 — x), 
since we’ve proved that this formula holds when 0 ^ x < 1 ; but then we are 
forced to conclude that the infinite sum is ' , although it’s a sum of integers! 

Another interesting example is the doubly infinite a.^ where = 
1 /(k + 1 ) for k 0 and = 1 / (k — 1 ) for k < 0. We can write this as 

■ ■■ + (— + (—3) + (—3) + 1 + i + y + j^ — • ( 2 - 58 ) 


If we evaluate this sum by starting at the “center” element and working 
outward, 

■ ■ ■ + (-5 + (-3 + {-J f'O) + 2) + 3) f 3) H » 

we get the value 1 ; and we obtain the same value 1 if we shift all the paren- 
theses one step to the left, 

" ’ + (~5 + (~l + (“3 + ( _ 2 ) + ^ + 1 ) + 3 ) + ’ ' • > 

because the sum of all numbers inside the innermost n parentheses is 

11 11 1 ,11 

3T — y + l + yH -4 t — 1 -^r • 

n+1 n 2 2 n— 1 n n + 1 


A similar argument shows that the value is 1 if these parentheses are shifted 
any fixed amount to the left or right; this encourages us to believe that the 
sum is indeed 1 . On the other hand, if we group terms in the following way, 



the nth pair of parentheses from inside out contains the numbers 


1 1 

n+1 n 



+ 1 + 


1 

2 


+ ■" + 


1 

2n— 1 


1 

2n 


1 + H2n — H n+1 . 


We’ll prove in Chapter 9 that lim n ^ 00 (H 2 n“H n+ i ) = In 2; hence this group- 
ing suggests that the doubly infinite sum should really be equal to 1 + In 2. 

There’s something flaky about a sum that gives different values when 
its terms are added up in different ways. Advanced texts on analysis have 
a variety of definitions by which meaningful values can be assigned to such 
pathological sums; but if we adopt those definitions, we cannot operate with 
^-notation as freely as we have been doing. We don’t need the delicate refine- 
ments of “conditional convergence” for the purposes of this book; therefore 
we’ll stick to a definition of infinite sums that preserves the validity of all the 
operations we’ve been doing in this chapter. 
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In fact, our definition of infinite sums is quite simple. Let K be any 
set, and let a k be a real-valued term defined for each k £ K. (Here ‘k’ 
might actually stand for several indices ki , k2, . . , , and K might therefore be 
multidimensional.) Any real number x can be written as the difference of its 
positive and negative parts, 

x = x + — x~ , where x + = x • [x > 0] and x~ = — x • [x < 0] . 


(Either x + = 0 or x~ =0.) We’ve already explained how to define values for 
the infinite sums ^LkeK a k an< ^ HkeK a k > because a k and a k are nonnega- 
tive. Therefore our general definition is 

Y_ a k = Y. Q k - H a k > ( 2 -59) 

keK keK keK 


unless the right-hand sums are both equal to oo. In the latter case, we leave 
Z^keK Q k undefined. 

Let A + = XkGK a k an< f ^ = HkeK a k • b A + and A - are both finite, 
the sum Xj keK a k sa ^ t° conver 9 e absolutely to the value A = A + — A - . 
If A + = oo but A~ is finite, the sum X^keK a k i s sa id to diverge to +oo. 
Similarly, if A~ = oo but A + is finite, )T keK ak is said to diverge to — oo. If 
A + = A~ =oo, all bets are off. 

We started with a definition that worked for nonnegative terms, then we 
extended it to real-valued terms. If the terms ak are complex numbers, we 
can extend the definition once again, in the obvious way: The sum X)keK a k 
is defined to be XikeK + ^21 keK 3a k , where IHak and 3ak are the real 
and imaginary parts of ak — provided that both of those sums are defined. 
Otherwise X^keK a k i s undefined. (See exercise 18.) 

The bad news, as stated earlier, is that some infinite sums must be left 
undefined, because the manipulations we’ve been doing can produce inconsis- 
tencies in all such cases. (See exercise 34.) The good news is that all of the 
manipulations of this chapter are perfectly valid whenever we’re dealing with 
sums that converge absolutely, as just defined. 

We can verify the good news by showing that each of our transformation 
rules preserves the value of all absolutely convergent sums. This means, more 
explicitly, that we must prove the distributive, associative, and commutative 
laws, plus the rule for summing first on one index variable; everything else 
we’ve done has been derived from those four basic operations on sums. 

The distributive law (2.15) can be formulated more precisely as follows: 
H XlkeK ak converges absolutely to A and if c is any complex number, then 
£ keK ca k converges absolutely to cA. We can prove this by breaking the 
sum into real and imaginary, positive and negative parts as above, and by 
proving the special case in which c > 0 and each term a k is nonnegative. The 


In other words, ab- 
solute convergence 
means that the sum 
of absolute values 
converges. 
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Best to skim this 
page the first time 
you get here. 

— Your friendly TA 


proof in this special case works because ]T keF ca^ = c ]T kgF a k f° r h n it e 
sets F; the latter fact follows by induction on the size of F. 

The associative law (2.16) can be stated as follows: If ]T kgK ak anc ^ 
HkeK bk converge absolutely to A and B, respectively, then JlkeK( ak + bk) 
converges absolutely to A + B. This turns out to be a special case of a more 
general theorem that we will prove shortly. 

The commutative law (2.17) doesn’t really need to be proved, because 
we have shown in the discussion following (2.35) how to derive it as a special 
case of a general rule for interchanging the order of summation. 

The main result we need to prove is the fundamental principle of multiple 
sums: Absolutely convergent sums over two or more indices can always be 
summed first with respect to any one of those indices. Formally, we shall 
prove that if J and the elements of {K j | j £ J} are any sets of indices such that 

y~ a k k converges absolutely to A, 

i e T 
keKj 


then there exist complex numbers Aj for each j £ J such that 

y~ a k k converges absolutely to Aj, and 

keKj 

y~ Aj converges absolutely to A . 

jej 

It suffices to prove this assertion when all terms are nonnegative, because we 
can prove the general case by breaking everything into real and imaginary, 
positive and negative parts as before. Let’s assume therefore that dj^ #5 0 
for all pairs (j, k) £ M, where M is the master index set {(j, k) | ) £ J, k £ Kj}. 
We are given that k)eM a j,k i s fbbte, namely that 

y aj,k ^ a 

(j,k)€F 

for all finite subsets F C M, and that A is the least such upper bound. If j is 
any element of J, each sum of the form J/ kgF a j,k where Fj is a finite subset 
of Kj is bounded above by A. Hence these finite sums have a least upper 
bound Aj 0, and ^ kgK a j,k = Aj by definition. 

We still need to prove that A is the least upper bound of Aj, for all 

finite subsets G C J. Suppose that G is a finite subset of f with JIj gG Aj = 
A' > A. We can find finite subsets Fj C Kj such that XlkeF a F k > (A/A')Aj 
for each j £ G with Aj > 0. There is at least one such j. But then 
HjeG keF a j,k > (A/A') )TjeG Aj = A, contradicting the fact that we have 
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H(j icjgp a ]X ^ for finite subsets FCM. Hence )T. gG Aj ^ A, for all 
finite subsets G C J. 

Finally, let A' be any real number less than A. Our proof will be complete 
if we can find a finite set G C J such that JC gG Aj > A'. We know that there’s 
a finite set FCM such that Y. ■ ] kjeF a j,k > A'; let G be the set of j’s in 
this F, and let Fj = {k | (j,k) G F}. Then X) j6G Aj j> £ jeG LkeFj a i,k = 
L (j ,k) 6 F a i,k>A'; QED. 

OK, we’re now legitimate! Everything we’ve been doing with infinite 
sums is justified, as long as there’s a finite bound on all finite sums of the 
absolute values of the terms. Since the doubly infinite sum (2.58) gave us 
two different answers when we evaluated it in two different ways, its positive 
terms 1 + \ | H — • must diverge to 00; otherwise we would have gotten the 

same answer no matter how we grouped the terms. 


So why have 1 been 
hearing a lot lately 
about “harmonic 
convergence”? 


Exercises 

Warmups 

1 What does the notation 

0 

k— 4 

mean? 

2 Simplify the expression x • ( [x > 0] — [x < 0]) . 

3 Demonstrate your understanding of ^-notation by writing out the sums 

Y cik and y_ a k 2 

0sCksC5 0sCk 2 sC5 

in full. (Watch out — the second sum is a bit tricky.) 

4 Express the triple sum 

y_ atjk 

1 §;i<j<k^4 


as a three-fold summation (with three X^’ s )i 
a summing first on k, then j, then i; 

b summing first on i, then j, then k. 

Also write your triple sums out in full without the ^-notation, using 
parentheses to show what is being added together first. 
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5 What’s wrong with the following derivation? 


rt \ / n \ nn rin 

)(Li = 

j = 1 7 V k=l k/ j=1 k=l k k=l k=1 k 


n = n 2 . 

k=1 


Yield to the rising 
power. 


6 What is the value of ^ k [1 ^ j 5 ^ k 5 ^ n] , as a function of j and n? 

7 Let Vf(x) = f(x) - f (x — 1 ). What is V(x™)? 

8 What is the value of 0— , when mis a given integer? 

9 What is the law of exponents for rising factorial powers, analogous to 
( 2 . 52 )? Use this to define x~ n . 

10 The text derives the following formula for the difference of a product: 


A(uv) = uAv + EvAu. 


How can this formula be correct, when the left-hand side is symmetric 
with respect to u and v but the right-hand side is not? 

Basics 

11 The general rule ( 2 . 56 ) for summation by parts is equivalent to 

Y (a k +i-a k )b k = a n b n -a 0 b 0 

0:gk<n 

- Y_ a k+i(b k+ i -b k ), forn^O. 

0$k<rv 

Prove this formula directly by using the distributive, associative, and 
commutative laws. 

12 Show that the function p(k) = k + (—1 ) k c is a permutation of the set of 
all integers, whenever c is an integer. 

13 Use the repertoire method to find a closed form for ^I k=0 (— 1 ) k k 2 . 

14 Evaluate Y k - 1 k2 k by rewriting it as the multiple sum Hi<j< k <n ^ k - 

15 Evaluate = X! k =i k 3 by the text’s Method 5 as follows: First write 

® n + n n = 2 Li<cj<c k <c n i k ; then a pp!y (2.33)- 

16 Prove that x— /(x — n)— = x— / (x — m)— , unless one of the denominators 
is zero. 

17 Show that the following formulas can be used to convert between rising 
and falling factorial powers, for all integers m: 

x m = ( — 1 — x)— = (x + m-ip = l/(x- 1 )=^; 

x m = = ( x - m + ] = 1/( x + 1)^\ 

(The answer to exercise 9 defines x~ m .) 
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18 Let 9\z and 3z be the real and imaginary parts of the complex num- 
ber z. The absolute value |z| is \J (OLz ) 2 + (3z) 2 . A sum )T keK a k of 
complex terms a k is said to converge absolutely when the real-valued 
sums 2IkeK^ a k an< ^ LkeK 3 ^ both converge absolutely. Prove that 
/LkeK Qk converges absolutely if and only if there is a bounding con- 
stant B such that ^IkeF l a kl ^ b for all finite subsets FCK. 

Homework exercises 

19 Use a summation factor to solve the recurrence 


To = 5; 

2T n = nT n _i + 3 • n! , for n > 0. 


20 Try to evaluate k=0 bHk by the perturbation method, but deduce the 
value of X.k=o Tlk instead. 

21 Evaluate the sums S n = )T k=0 (— 1) n ~ k , T n = X!k=o(~1 ) n ~ k k, and 
U n = 2Z k=0 (— 1 ) n ~ k k 2 by the perturbation method, assuming that n 

0 . 

22 Prove Lagrange’s identity (without using induction): 


Y (ajb k -a k bj) 2 

1 ^j<k^n 


v k=1 2 v k=l 2 v k=l 2 


Prove, in fact, an identity for the more general double sum 

Y (ajb k - a k bj)(AjB k - A k Bj) . 

1 ^j<k$n 


23 Evaluate the sum X!k=i (21c + 1 )/k(k + 1 ) in two ways: 

a Replace 1 /k(k + 1 ) by the “partial fractions” 1 /k — 1 /(k + 1 ). 
b Sum by parts. 

24 What is 21 0<k<n H k /(k + 1 ) (k + 2)? Hint: Generalize the derivation of 
(2-57)- 

25 The notation OkeK Qk means the product of the numbers Qk for all 
ke K. Assume for simplicity that Qk ^ 1 for only finitely many k; hence 
infinite products need not be defined. What laws does this ]~[-notation 
satisfy, analogous to the distributive, associative, and commutative laws 
that hold for Y? 

26 Express the double product Oi <j<k<n Q j Qk terms of the single prod- 
uct Ok=i Qk by manipulating fj-notation. (This exercise gives us a 
product analog of the upper-triangle identity (2.33).) 


It’s hard to prove 
the identity of 
somebody who’s 
been dead for 1 75 
years. 


This notation was 
introduced by 
Jacobi in 1829 [192], 
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The laws of the 
jungle. 


27 Compute A(c-), and use it to deduce the value of _i ( — 2)— /Tc. 

28 At what point does the following derivation go astray? 


1 


L 


i 

k(k+1) 



= Il(y [ > =k+1] 

k^l j^1 v ’ 
k^1 v } 
k^1 V ; 




S [)=t -") 

i[i= k -ll) 

E [k = i + ") 

y -1 = 

^iO + D 


Exam problems 

29 Evaluate the sum (— 1) k k/(4k 2 — 1). 

30 Cribbage players have long been aware that 15 = 7 + 8= 4 + 5 + 6 = 
1 + 2 + 3 + 4 + 5. Find the number of ways to represent 1 050 as a sum of 
consecutive positive integers. (The trivial representation ‘1050’ by itself 
counts as one way; thus there are four, not three, ways to represent 15 
as a sum of consecutive positive integers. Incidentally, a knowledge of 
cribbage rules is of no use in this problem.) 


31 Riemann’s zeta function C(k) is defined to be the infinite sum 


, 11 V- 1 

1 + 2k + 3^ + "' “ 

i»l 1 

Prove that XLk> 2 (A(k) — 1) =1. What is the value of ^ k>1 (C(2k) — 1)? 
32 Let a — b = max(0, a — b). Prove that 


y min(k, x — k) = J~ (x — (2k + 1 )) 
k^0 k^0 


for all real x 0, and evaluate the sums in closed form. 


Bonus problems 

33 Let AkeK a k denote the minimum of the numbers (or their greatest 
lower bound, if K is infinite), assuming that each is either real or ±oo. 
What laws are valid for A-notation, analogous to those that work for Y 
and ]~[? (See exercise 25.) 
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34 Prove that if the sum X.keK a k i s undefined according to ( 2 .59), then it 
is extremely flaky in the following sense: If and A + are any given 
real numbers, it’s possible to find a sequence of finite subsets Fi C Tb C 
F 3 C • • • of K such that 

cik ^ A - , when n is odd; dk ^ A + , when n is even. 

k-GF-n kGF n 

35 Prove Goldbach’s theorem 

1111 1 1 1 1 _^-1 

3 + 7 + 8 + 15 + 24 + 26 + 3T + 35 + '" ~ 2— k- 1 ’ 

kGP 

where P is the set of “perfect powers” defined recursively as follows: Perfect power 

corrupts perfectly. 

P = {m n m | 2, n^2, m^P}. 


36 Solomon Golomb’s “self-describing sequence” (f(l ), f(2), f (3) , . . . ) is the 
only nondecreasing sequence of positive integers with the property that 
it contains exactly f(k) occurrences of k for each k. A few moments’ 
thought reveals that the sequence must begin as follows: 


n 

1 

2 3 4 

5 

6 7 8 9 

10 

11 

12 

f(n) 

1 

2 2 3 

3 

4 4 4 5 

5 

5 

6 


Let g(n) be the largest integer m such that f(m) = n. Show that 
a g(n) = f(k). 

b g(g(n)) = ^£ =1 kf(k). 

c g(g(g(n))) = }ng(n)(g(n) + l) - \ Y_y=\ 9( k )(g( k ) + 1)- 

Research problem 

37 Will all the 1 /k by 1 /(k + 1 ) rectangles, for k ^ 1 , fit together inside a 
1 by 1 square? (Recall that their areas sum to 1.) 
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)Ouch.( 


Integer Functions 


WHOLE NUMBERS constitute the backbone of discrete mathematics, and we 
often need to convert from fractions or arbitrary real numbers to integers. Our 
goal in this chapter is to gain familiarity and fluency with such conversions 
and to learn some of their remarkable properties. 

3.1 FLOORS AND CEILINGS 

We start by covering the floor (greatest integer) and ceiling (least 
integer) functions, which are defined for all real x as follows: 

|_xj = the greatest integer less than or equal to x ; 

(3- 1 ) 

px] = the least integer greater than or equal to x. 

Kenneth E. Iverson introduced this notation, as well as the names “floor” and 
“ceiling,” early in the 1960s [191, page 12]. He found that typesetters could 
handle the symbols by shaving the tops and bottoms off of ‘ [’ and ']’. His 
notation has become sufficiently popular that floor and ceiling brackets can 
now be used in a technical paper without an explanation of what they mean. 
Until recently, people had most often been writing ‘ [x] ’ for the greatest integer 
x, without a good equivalent for the least integer function. Some authors 
had even tried to use ‘]x[’ — with a predictable lack of success. 

Besides variations in notation, there are variations in the functions them- 
selves. For example, some pocket calculators have an INT function, defined 
as |_xj when x is positive and [x] when x is negative. The designers of 
these calculators probably wanted their INT function to satisfy the iden- 
tity INT(— x) = — INT(x). But we’ll stick to our floor and ceiling functions, 
because they have even nicer properties than this. 

One good way to become familiar with the floor and ceiling functions 
is to understand their graphs, which form staircase-like patterns above and 
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below the line f(x) = x: 



We see from the graph that, for example, 

Lej=2, L-eJ = -3 , 

f e l = 3 , \-e}=-2, 

since e = 2.71 828 .... 

By staring at this illustration we can observe several facts about floors 
and ceilings. First, since the floor function lies on or below the diagonal line 
f(x) = x, we have [xj ^ x; similarly |"x] x. (This, of course, is quite 
obvious from the definition.) The two functions are equal precisely at the 
integer points: 

[xj = x x is an integer 4=4 |"x] — x. 

(We use the notation ‘ 4 = 4 ’ to mean “if and only if.”) Furthermore, when 
they differ the ceiling is exactly 1 higher than the floor: 

|x] — |xJ = t x i s not an integer] . ( 3 . 2 ) Cute. 

By Iverson ’s bracket 

If we shift the diagonal line down one unit, it lies completely below the floor convention, this is a 
function, so x — 1 < |_xj; similarly x + 1 > [x]. Combining these observations con: P^ e equation. 
gives us 

x 1 < |xj ^ x ^ M < X + 1 . ( 3 . 3 ) 


Finally, the functions are reflections of each other about both axes: 

L-xJ = -M; r-*l = -|xj. 


(34) 
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Next week we’re 
getting walls. 


Thus each is easily expressible in terms of the other. This fact helps to 
explain why the ceiling function once had no notation of its own. But we 
see ceilings often enough to warrant giving them special symbols, just as we 
have adopted special notations for rising powers as well as falling powers. 
Mathematicians have long had both sine and cosine, tangent and cotangent, 
secant and cosecant, max and min; now we also have both floor and ceiling. 

To actually prove properties about the floor and ceiling functions, rather 
than just to observe such facts graphically, the following four rules are espe- 


(3-5) 


dally useful: 

[xj = n 


Tl / X< Tlf 1 , 

( a ) 

]_xj = n 


x — 1 <n/x, 

(b) 

|"x] = n 


n — 1 < x ^ n , 

( c ) 

M=n 


X / Tl < X f 1 . 

(d) 


(We assume in all four cases that u is an integer and that x is real.) Rules 
(a) and (c) are immediate consequences of definition (3.1); rules (b) and (d) 
are the same but with the inequalities rearranged so that n is in the middle. 
It’s possible to move an integer term in or out of a floor (or ceiling): 


|_x + nj = |_ X J + n , integer u. (3.6) 

(Because rule (3.5(a)) says that this assertion is equivalent to the inequalities 
[xj + n ^ x + n < [xj + n + 1 .) But similar operations, like moving out a 
constant factor, cannot be done in general. For example, we have LtlxJ 7^ n|_xj 
when n = 2 and x = 1/2. This means that floor and ceiling brackets are 
comparatively inflexible. We are usually happy if we can get rid of them or if 
we can prove anything at all when they are present. 

It turns out that there are many situations in which floor and ceiling 
brackets are redundant, so that we can insert or delete them at will. For 
example, any inequality between a real and an integer is equivalent to a floor 
or ceiling inequality between integers: 

x < u <t=^> 

n < x <t=^> 

x^ n <=^> 

n/x <==» 

These rules are easily proved. For example, if x < n then surely |ycj < n, since 
[xj ^ x. Conversely, if |jxJ < ti then we must have x < n, since x < |_xj + 1 
and [xj + 1 ^ n. 

It would be nice if the four rules in (3.7) were as easy to remember as 
they are to prove. Each inequality without floor or ceiling corresponds to the 


cj < n, 

< M, 

cl sj n, 


(a) 

(b) 

( c ) 
(d) 


(3-7) 
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same inequality with floor or with ceiling; but we need to think twice before 
deciding which of the two is appropriate. 

The difference between x and fxj is called the fractional part of x, and 
it arises often enough in applications to deserve its own notation: 

!x) x- _xj . (3.8) 

We sometimes call fxj the integer part of x, since x = fxj + {x}. If a real 
number x can be written in the form x = n + 0, where n is an integer and 
0 sj 0 < 1, we can conclude by (3.5(a)) that n = fxj and 0={x}. 

Identity (3.6) doesn’t hold if n is an arbitrary real. But we can deduce 
that there are only two possibilities for fx + yj in general: If we write x = 
|xj + {x} and y = fyj + {y}, then we have |x + yj = |_xj + |_yj + Lfx} + {y}J . 
And since 0 sC {x} + {y} < 2, we find that sometimes fx + yj is fxj + fyj i 
otherwise it’s |_xj + |_yj -|- 1 . 

3.2 FLOOR/CEILING APPLICATIONS 

We’ve now seen the basic tools for handling floors and ceilings. Let’s 
put them to use, starting with an easy problem: What’s [lg35J? (Following a 
suggestion of Edward M. Reingold, we use ‘lg’ to denote the base-2 logarithm.) 
Well, since 2 5 < 35 ^ 2 6 , we can take logs to get 5 < lg 35 ^ 6; so relation 
(3.5(c)) tells us that f lg 35~| = 6. 

Note that the number 35 is six bits long when written in radix 2 notation: 
35 = (100011)2. Is it always true that [lgrij is the length of n written in 
binary? Not quite. We also need six bits to write 32 = (100000)2- So [lgnj 
is the wrong answer to the problem. (It fails only when n is a power of 2, 
but that’s infinitely many failures.) We can find a correct answer by realizing 
that it takes m bits to write each number n such that 2 m ~ 1 ^ n < 2 m ; thus 
(3.5(a)) tells us that m — 1 = [lg tlJ , s ° m. = |_lg rtj + 1 . That is, we need 
flgnj + 1 bits to express n in binary, for all n. > 0. Alternatively, a similar 
derivation yields the answer fig (n + 1 )]; this formula holds for n = 0 as well, 
if we’re willing to say that it takes zero bits to write n = 0 in binary. 

Let’s look next at expressions with several floors or ceilings. What is 
f fxj ”| ? Easy — since fxj is an integer, [fxj] i s just fxj- So is any other ex- 
pression with an innermost fxj surrounded by any number of floors or ceilings. 

Here’s a tougher problem: Prove or disprove the assertion 

Lx/L^IJ = Lv^J , real x ^ 0. (3.9) 


Hmmm. We’d bet- 
ter not write {x} 
for the fractional 
part when it could 
be confused with 
the set containing x 
as its only element. 


The second case 
occurs if and only 
if there’s a “carry” 
at the position of 
the decimal point, 
when the fractional 
parts {%} and {y} 
are added together. 


Equality obviously holds when x is an integer, because x = fxj . And there’s 
equality in the special cases n — 3.14159..., e = 2.71828..., and cf = 
( 1 + y/5 )/2 = 1.61 803 . . . , because we get 1=1. Our failure to find a coun- 
terexample suggests that equality holds in general, so let’s try to prove it. 


(Of course n, e, 
and are the 
obvious first real 
numbers to try, 
aren’t they?) 
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Skepticism is 
healthy only to 
a limited extent. 
Being skeptical 
about proofs and 
programs (particu- 
larly your own ) will 
probably keep your 
grades healthy and 
your job fairly se- 
cure. But applying 
that much skepti- 
cism will probably 
also keep you shut 
away working all 
the time, instead 
of letting you get 
out for exercise and 
relaxation. 

Too much skepti- 
cism is an open in- 
vitation to the state 
of rigor mortis, 
where you become 
so worried about 
being correct and 
rigorous that you 
never get anything 
finished. 

— A skeptic 


(This observation 
was made by R. J. 
McEliece when he 
was an undergrad.) 


Incidentally, when we’re faced with a “prove or disprove,” we’re usually 
better off trying first to disprove with a counterexample, for two reasons: 
A disproof is potentially easier (we need just one counterexample); and nit- 
picking arouses our creative juices. Even if the given assertion is true, our 
search for a counterexample often leads us to a proof, as soon as we see why 
a counterexample is impossible. Besides, it’s healthy to be skeptical. 

If we try to prove that = [\/xJ with the help of calculus, we might 

start by decomposing x into its integer and fractional parts [xj + {x} = n + 0 
and then expanding the square root using the binomial theorem: (n + 0) 1 = 

ti 1 / 2 + n _1 / 2 0/2 — n~ 3 / 2 0 2 /8 + • • • . But this approach gets pretty messy. 

It’s much easier to use the tools we’ve developed. Here’s a possible strat- 
egy: Somehow strip off the outer floor and square root of |_\/[xJJ , then re- 
move the inner floor, then add back the outer stuff to get |_\/xj . OK. We let 
m = I -J |_xj I and invoke (3.5(a)), giving m ^ \ZLaT < ta+ 1. That removes 
the outer floor bracket without losing any information. Squaring, since all 
three expressions are nonnegative, we have m 2 <7 |_xj < (m+ l) 2 . That gets 
rid of the square root. Next we remove the floor, using (3.7(d)) for the left 
inequality and (3.7(a)) for the right: m 2 ^ x < (m + l) 2 . It’s now a simple 
matter to retrace our steps, taking square roots to get m ^ \/x < m + 1 and 
invoking (3.5(a)) to get m = [v / xj- Thus |_\/pxj| = m = [v / xji the assertion 
is true. Similarly, we can prove that 

IVRI = fv^l, real x ^ 0. 

The proof we just found doesn’t rely heavily on the properties of square 
roots. A closer look shows that we can generalize the ideas and prove much 
more: Let f (x) be any continuous, monotonically increasing function with the 
property that 

f (x) = integer =£> x = integer . 

(The symbol ‘=^’ means “implies.”) Then we have 

[f(x)J = L f (|xJ)J and [f(x)j = [f^x])], (3-i°) 

whenever f(x), f ( |_x J ) , and f ( f x”| ) are defined. Let’s prove this general prop- 
erty for ceilings, since we did floors earlier and since the proof for floors is 
almost the same. If x = px], there’s nothing to prove. Otherwise x < px], 
and f (x) < f(Pxj) since f is increasing. Hence [f(x)] ^ [f([x])], since [] is 
nondecreasing. If [f(x)] < P f ( P x~| ) ”| , there must be a number y such that 
x y < |"x] and f (y ) = [f(x)], since f is continuous. This y is an integer, be- 
cause of f ’s special property. But there cannot be an integer strictly between 
x and Px]. This contradiction implies that we must have [f(x)] = [f(px])]. 
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An important special case of this theorem is worth noting explicitly: 


x + m 


[xj + m 

and 

x + m 


Px] + m" 

n 


L n 

n 


n 


if m and n are integers and the denominator n is positive. For example, let 
m = 0; we have [[[x/IOJ/IOj/lOj = [x/IOOOJ. Dividing thrice by 10 and 
throwing off digits is the same as dividing by 1000 and tossing the remainder. 
Let’s try now to prove or disprove another statement: 

[ \/[*T ] = \V* 1 , real x ^ 0. 


This works when x = 7t and x = e, but it fails when x = 4>; so we know that 
it isn’t true in general. 

Before going any further, let’s digress a minute to discuss different levels 
of problems that might appear in books about mathematics: 

Level 1. Given an explicit object x and an explicit property P(x), prove that 
P(x) is true. For example, “Prove that [ttJ = 3.” Here the problem involves 
finding a proof of some purported fact. 

Level 2. Given an explicit set X and an explicit property P(x), prove that 
P(x) is true for all x £ X. For example, “Prove that [xj x for all real x.” 
Again the problem involves finding a proof, but the proof this time must be 
general. We’re doing algebra, not just arithmetic. 

Level 3. Given an explicit set X and an explicit property P(x), prove or 
disprove that P(x) is true for all x £ X. For example, “Prove or disprove 
that [ \J pxj ] = [i/x] for all real x 0.” Here there’s an additional level 
of uncertainty; the outcome might go either way. This is closer to the real 
situation a mathematician constantly faces: Assertions that get into books 
tend to be true, but new things have to be looked at with a jaundiced eye. If 
the statement is false, our job is to find a counterexample. If the statement 
is true, we must find a proof as in level 2. 

Level 4. Given an explicit set X and an explicit property P(x), find a neces- 
sary and sufficient condition Q(x) that P(x) is true. For example, “Find a 
necessary and sufficient condition that [xj ^ |V|.” The problem is to find Q 
such that P(x) Q(x). Of course, there’s always a trivial answer; we can 

take Q(x) = P(x). But the implied requirement is to find a condition that’s as 
simple as possible. Creativity is required to discover a simple condition that 
will work. (For example, in this case, “[xj px] x is an integer.”) The 

extra element of discovery needed to find Q(x) makes this sort of problem 
more difficult, but it’s more typical of what mathematicians must do in the 
“real world.” Finally, of course, a proof must be given that P(x) is true if and 
only if Q(x) is true. 


In my other texts 
“prove or disprove” 
seems to mean the 
same as “prove,” 
about 99.44% of 
the time; but not 
in this book. 


But no simpler. 

— A. Einstein 



Home of the 
Toledo Mudhens. 


(Or, by pessimists, 
half- closed.) 
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Level 5. Given an explicit set X, find an interesting property P(x) of its 
elements. Now we’re in the scary domain of pure research, where students 
might think that total chaos reigns. This is real mathematics. Authors of 
textbooks rarely dare to pose level 5 problems. 

End of digression. But let’s convert the last question we looked at from 
level 3 to level 4: What is a necessary and sufficient condition that [ \/[xJ] = 
IVxl? We have observed that equality holds when x = 3.142 but not when 
x = 1 .618; further experimentation shows that it fails also when x is between 
9 and 1 0. Oho. Yes. We see that bad cases occur whenever m 2 < x < m 2 + 1 , 
since this gives m on the left and m + 1 on the right. In all other cases 
where yjx is defined, namely when x = 0 or m 2 + 1 ^ x ^ (m + l) 2 , we 
get equality. The following statement is therefore necessary and sufficient for 
equality: Either x is an integer or \/[xJ isn’t. 

For our next problem let’s consider a handy new notation, suggested 
by C. A. R. Hoare and Lyle Ramshaw, for intervals of the real line: [a. . (3] 
denotes the set of real numbers x such that a ^ x |3. This set is called 
a closed interval because it contains both endpoints oc and |3. The interval 
containing neither endpoint, denoted by (a. . (3 ) , consists of all x such that 
oc < x < (3 ; this is called an open interval. And the intervals [a. . (3) and 
(a.. (3], which contain just one endpoint, are defined similarly and called 
half- open. 

How many integers are contained in such intervals? The half-open inter- 
vals are easier, so we start with them. In fact half-open intervals are almost 
always nicer than open or closed intervals. For example, they’re additive — we 
can combine the half-open intervals [oc. . (3) and [|3 . . y) to form the half-open 
interval [a. . y). This wouldn’t work with open intervals because the point |3 
would be excluded, and it could cause problems with closed intervals because 
|3 would be included twice. 

Back to our problem. The answer is easy if oc and (3 are integers: Then 
[a . . (3) contains the (3 — oc integers oc, cc + 1 , . . . , (3 — 1 , assuming that oc ^ (3. 
Similarly (a. . |3] contains (3 — oc integers in such a case. But our problem is 
harder, because oc and |3 are arbitrary reals. We can convert it to the easier 
problem, though, since 

oc ^ n < |3 [a] ^ n < [|3] , 

oc < n ^ |3 |aJ < tl < [ 0 J » 

when n is an integer, according to ( 3 . 7 ). The intervals on the right have 
integer endpoints and contain the same number of integers as those on the left, 
which have real endpoints. So the interval [a . . (3 ) contains exactly [ |3"| — [a] 
integers, and (a. . |3] contains [|3J — L a J- This is a case where we actually 
want to introduce floor or ceiling brackets, instead of getting rid of them. 
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By the way, there’s a mnemonic for remembering which case uses floors 
and which uses ceilings: Half-open intervals that include the left endpoint 
but not the right (such as 0 ^ 0 < 1 ) are slightly more common than those 
that include the right endpoint but not the left; and floors are slightly more 
common than ceilings. So by Murphy’s Law, the correct rule is the opposite 
of what we’d expect — ceilings for [a. . (3) and floors for (a. . (3]L 

Similar analyses show that the closed interval [a.. (3] contains exactly 
L(3J — [a] + 1 integers and that the open interval (a . . (3 ) contains [|3] — [ocj — 1 ; 
but we place the additional restriction a ^ (3 on the latter so that the formula 
won’t ever embarrass us by claiming that an empty interval (a. . a.) contains 
a total of —1 integers. To summarize, we’ve deduced the following facts: 


Just like we can re- 
member the date of 
Columbus’s depar- 
ture by singing, “In 
fourteen hundred 
and ninety-three/ 
Columbus sailed the 
deep blue sea.” 


interval 

integers contained 

restrictions 

[a. . |3] 

L(3J — fa] + 1 

|3 , 

[a. . |3) 

TI31 - \oi\ 

asj |3 , 

(oc. . (3] 

LI3J - |aj 

|3 , 

(a.. (3) 

f(31 - [aj - 1 

a < |3 . 


Now here’s a problem we can’t refuse. The Concrete Math Club has a 
casino (open only to purchasers of this book) in which there’s a roulette wheel 
with one thousand slots, numbered 1 to 1 000. If the number n that comes up 
on a spin is divisible by the floor of its cube root, that is, if 


L(/nJ \ n, 


then it’s a winner and the house pays us $5; otherwise it’s a loser and we 
must pay $1. (The notation a\b, read “a divides b,” means that b is an exact 
multiple of a; Chapter 4 investigates this relation carefully.) Can we expect 
to make money if we play this game? 

We can compute the average winnings — that is, the amount we’ll win 
(or lose) per play — by first counting the number W of winners and the num- 
ber L = 1000 — W of losers. If each number comes up once during 1000 plays, 
we win 5W dollars and lose L dollars, so the average winnings will be 

5W-L _ 5W — (1000 — W) _ 6W-1000 

1000 " 1000 ~ 1000 ’ 

If there are 167 or more winners, we have the advantage; otherwise the ad- 
vantage is with the house. 

How can we count the number of winners among 1 through 1000? It’s 
not hard to spot a pattern. The numbers from 1 through 2 3 — 1 =7 are all 
winners because [^/rCJ = 1 for each. Among the numbers 2 3 = 8 through 
3 3 — 1 = 26, only the even numbers are winners. And among 3 3 = 27 through 
4 3 — 1 = 63, only those divisible by 3 are. And so on. 


(A poll of the class 
at this point showed 
that 28 students 
thought it was a 
bad idea to play, 

13 wanted to gam- 
ble, and the rest 
were too confused 
to answer.) 

(So we hit them 
with the Concrete 
Math Club.) 
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True. 

Where did you say 
this casino is? 


The whole setup can be analyzed systematically if we use the summa- 
tion techniques of Chapter 2, taking advantage of Iverson’s convention about 
logical statements evaluating to 0 or 1 : 

1000 

W = [n is a winner] 

n=1 

= \t-l] = Y [k= [y/njl [k\n][1 1000] 

1 <:n<: 1 000 k,n 

= ^ [k 3 ^n< (k + 1 ] 3 ] [n = km][l 1000] 

k,m,n 

= 1 +^[k 3 <Jkm<(k+l) 3 ][l^k< 10 ] 

k,m 

= 1 + ^[me[k 2 ..(k+ 1 ) 3 /k)][ 1<Ck<10] 

k,m 

= 1 + y~ ( |"k _ + 3k + 3 + 1 /k] — [k 2 ] ) 

1$k<10 

= 1+^1 (3k + 4) = 1 + = 172. 

1^k<10 

This derivation merits careful study. Notice that line 
(3.12) for the number of integers in a half-open interval, 
maneuver is the decision made between lines 3 and 4 to 
special case. (The inequality k 3 ^ n < (k + 1 ) 3 does not 
1 1000 when k = 10.) In general, boundary conditions tend to be the 

most critical part of ^-manipulations. 

The bottom line says that W = 1 72; hence our formula for average win- 
nings per play reduces to (6- 172— 1 000) / 1 000 dollars, which is 3.2 cents. We 
can expect to be about $3.20 richer after making 100 bets of $1 each. (Of 
course, the house may have made some numbers more equal than others.) 

The casino problem we just solved is a dressed- up version of the more 
mundane question, “How many integers n, where 1 ^ n ^ 1 000, satisfy the re- 
lation L^/nJ \ n?” Mathematically the two questions are the same. But some- 
times it’s a good idea to dress up a problem. We get to use more vocabulary 
(like “winners” and “losers”), which helps us to understand what’s going on. 

Let’s get general. Suppose we change 1000 to 1000000, or to an even 
larger number, N . (We assume that the casino has connections and can get a 
bigger wheel.) Now how many winners are there? 

The same argument applies, but we need to deal more carefully with the 
largest value of k, which we can call K for convenience: 

K = [¥n\ . 


6 uses our formula 
The only “difficult” 
treat n = 1 000 as a 
combine easily with 
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(Previously K was 10.) The total number of winners for general N comes to 
W= Y_ (3k + 4) +^[K 3 ^Km?CN] 

ls:k<K m 

= l(7 + 3K + l)(K-l) + ^[me[K 2 ..N/K]] 

m 

= |K 2 + |K-4 + ^[me[K 2 ..N/K]] . 

m 

We know that the remaining sum is [TSI /KJ — [K 2 ] + 1 = [Nl /KJ — K 2 + 1; 
hence the formula 

W = [N/KJ + lK 2 + |K-3, K = Lv'NJ (3-13) 

gives the general answer for a wheel of size N. 

The first two terms of this formula are approximately N 223 + ^N 223 = 
^N 223 , and the other terms are much smaller in comparison, when N is large. 
In Chapter 9 we’ll learn how to derive expressions like 

w = |N 2/3 + 0(N 123 ) , 

where 0(N 123 ) stands for a quantity that is no more than a constant times 
N 123 . Whatever the constant is, we know that it’s independent of N; so for 
large N the contribution of the O-term to W will be quite small compared 
with |N 223 . For example, the following table shows how close 4N 2 ^ 3 is to 
W: 


N | N 223 W % error 


1,000 

150.0 

172 

12.791 

10,000 

696.2 

746 

6.670 

100,000 

3231.7 

3343 

3.331 

1,000,000 

15000.0 

15247 

1.620 

10,000,000 

69623.8 

70158 

0.761 

100,000,000 

323165.2 

324322 

0.357 

1,000,000,000 

1500000.0 

1502496 

0.166 

a pretty good approximation. 




Approximate formulas are useful because they’re simpler than formu- 
las with floors and ceilings. However, the exact truth is often important, 
too, especially for the smaller values of N that tend to occur in practice. 
For example, the casino owner may have falsely assumed that there are only 
^N 223 = 150 winners when N = 1000 (in which case there would be a 10^ 
advantage for the house). 
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. . . without lots 
of generality . . . 


“If x be an in- 
commensurable 
number less than 
unity, one of the 
series of quantities 
m/x, m/(1 — x) , 
where m is a whole 
number, can be 
found which shall 
lie between any 
given consecutive 
integers, and but 
one such quantity 
can be found." 

— Rayleigh [304] 


Right, because 
exactly one of 
the counts must 
increase when n 
increases by 1 . 


Our last application in this section looks at so-called spectra. We define 
the spectrum of a real number oc to be an infinite multiset of integers, 

Spec (a) =' {LaJ, |2aJ, |3a|, ...}. 

(A multiset is like a set but it can have repeated elements.) For example, the 
spectrum of 1 /2 starts out {0, 1 , 1 , 2, 2 , 3, 3, . . . 

It’s easy to prove that no two spectra are equal — that oc ^ |3 implies 
Spec (a) ^2 Spec(|3). For, assuming without loss of generality that oc < (3, 
there’s a positive integer m such that m((3 — a) ^ 1. (In fact, any m 
[1/(|3 — a)] will do; but we needn’t show off our knowledge of floors and 
ceilings all the time.) Hence m(3 — mcx 1 , and Lm|3J > L moc J • Thus 
Spec ((3) has fewer than m elements 5} {maj, while Spec (a) has at least m. 

Spectra have many beautiful properties. For example, consider the two 
multisets 

Specfv 7 !) = {1,2,4,5,7,8,9,11,12,14,15,16,18,19,21,22,24,...}, 
Spec(2 + \fl ) = {3,6,10,13,17,20,23,27,30,34,37,40,44,47,51,...}. 

It’s easy to calculate Spec(V2 ) with a pocket calculator, and the nth element 
of Spec (2 + \fl ) is just 2n more than the nth element of Spec(\/2 ), by (3.6). 
A closer look shows that these two spectra are also related in a much more 
surprising way: It seems that any number missing from one is in the other, 
but that no number is in both! And it’s true: The positive integers are the 
disjoint union of Spec(\/2 ) and Spec(2 + \f2 ). We say that these spectra form 
a partition of the positive integers. 

To prove this assertion, we will count how many of the elements of 
Spec(-\/2 ) are n, and how many of the elements of Spec(2 + v / 2 ) are 5} n. If 
the total is n, for each n, these two spectra do indeed partition the integers. 

Let a be positive. The number of elements in Spec (a) that are n is 

N(cx,n) = [ [kaj ^ n] 

k>0 

= [LkaJ <n + 1] 

k>0 

= ^ [koc<n + 1] 
k>0 

= [0 < k< (n + 1 )/a| 

k 

= [(n+ 1)/a] - 1 . 


(3-i4) 
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This derivation has two special points of interest. First, it uses the law 

m ^ n m < n + 1 , integers m and n (3.15) 

to change to '<’, so that the floor brackets can be removed by (3.7). 
Also — and this is more subtle — it sums over the range k > 0 instead of k ^ 1 , 
because (n + 1 )/a might be less than 1 for certain n and a. If we had tried 
to apply (3.12) to determine the number of integers in [1 . . (n+1)/a), rather 
than the number of integers in (0 . . (n+ 1 )/ct), we would have gotten the right 
answer; but our derivation would have been faulty because the conditions of 
applicability wouldn’t have been met. 

Good, we have a formula for N(cx,n). Now we can test whether or not 
Spec(-\/2 ) and Sipecil+y/l ) partition the positive integers, by testing whether 
or not ISUv^n.) + N(2 + \/2,n) = n for all integers n > 0, using (3.14): 



3.3 FLOOR/CEILING RECURRENCES 

Floors and ceilings add an interesting new dimension to the study 
of recurrence relations. Let’s look first at the recurrence 

K ° = 1 ; , i6) 
K n +i = 1 + min(2K Ln/ 2j,3K Lrl/ 3j) , for n ^ 0. 3 

Thus, for example, Ki is 1 + min(2Ko,3Ko) = 3; the sequence begins 1, 3, 3, 
4, 7, 7, 7, 9, 9, 10, 13, .... One of the authors of this book has modestly 
decided to call these the Knuth numbers. 
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Exercise 25 asks for a proof or disproof that K n n, for all n ^ 0. The 
first few K’s just listed do satisfy the inequality, so there’s a good chance that 
it’s true in general. Let’s try an induction proof: The basis n = 0 comes 
directly from the defining recurrence. For the induction step, we assume 
that the inequality holds for all values up through some fixed nonnegative n, 
and we try to show that ^ n + 1 . From the recurrence we know that 

K n+ i = 1 + min(2Ki n /2j , 3Ki n / 3 j ). The induction hypothesis tells us that 
2K [TT./2J 2[n/2J and 3Ki n / 3 j 3[n/3J. However, 2[n/2J can be as small 

as n - 1 , and 3[n/3J can be as small as n — 2. The most we can conclude 
from our induction hypothesis is that K n+ i ^ 1 + (n — 2); this falls far short 
of K n+ i ^ n + 1. 

We now have reason to worry about the truth of K n n, so let’s try to 
disprove it. If we can find an n such that either 2K| n /2j < n or 3K^ n / 3 j < n, 
or in other words such that 


K|n/2j < n/2 or K [n/3J < n/3 , 

we will have K n+ i < n + 1. Can this be possible? We’d better not give the 
answer away here, because that will spoil exercise 25. 

Recurrence relations involving floors and/or ceilings arise often in com- 
puter science, because algorithms based on the important technique of “divide 
and conquer” often reduce a problem of size n to the solution of similar prob- 
lems of integer sizes that are fractions of n. For example, one way to sort 
n records, if n > 1 , is to divide them into two approximately equal parts, one 
of size [n/2] and the other of size [n/2J. (Notice, incidentally, that 

n = [n/2] + [n/2J ; (3.17) 

this formula comes in handy rather often.) After each part has been sorted 
separately (by the same method, applied recursively), we can merge the 
records into their final order by doing at most n — 1 further comparisons. 
Therefore the total number of comparisons performed is at most f(n), where 


f(1) = 0; 

f (n) = f ( [n/2] ) + f ( [n/2J ) + n - 1 , for n > 1 . 


(3-i8) 


A solution to this recurrence appears in exercise 34. 

The Josephus problem of Chapter 1 has a similar recurrence, which can 
be cast in the form 


J(D = i; 

J(n) = 2J( [n/2J ) — (—1 ) n 


for n > 1 . 
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We’ve got more tools to work with than we had in Chapter 1, so let’s 
consider the more authentic Josephus problem in which every third person is 
eliminated, instead of every second. If we apply the methods that worked in 
Chapter 1 to this more difficult problem, we wind up with a recurrence like 


J 3 (u) 


| J 3 (Lf tvJ) +a n modn+1, 


where ‘mod’ is a function that we will be studying shortly, and where we have 
a n = —2, +1, or — j according as n mod 3—0, 1, or 2. But this recurrence 
is too horrible to pursue. 

There’s another approach to the Josephus problem that gives a much 
better setup. Whenever a person is passed over, we can assign a new number. 
Thus, 1 and 2 become n + 1 and n + 2, then 3 is executed; 4 and 5 become 
n. + 3 and n. + 4, then 6 is executed; . . . ; 3k + 1 and 3k + 2 become n + 2k + 1 
and n + 2k + 2, then 3k + 3 is executed; . . . then 3n is executed (or left to 
survive). For example, when n = 10 the numbers are 


1 

2 

3 4 

5 6 

7 

8 

9 10 

11 

12 

13 

14 

15 

16 

17 

18 


19 

20 


21 

22 



23 

24 



25 



26 




27 



28 







29 







30 






The kth person eliminated ends up with number 3k. So we can figure out who 
the survivor is if we can figure out the original number of person number 3n. 

If N > n, person number N must have had a previous number, and we 
can find it as follows: We have N = u + 2k + 1 or N = n + 2k + 2, hence 
k = [(N — n — 1 )/2J ; the previous number was 3k + 1 or 3k + 2, respectively. 
That is, it was 3k + (N — n — 2k) = k + N — n. Hence we can calculate the 
survivor’s number J 3 (n) as follows: 


N := 3n; 

while N > u do N := 

J 3 (n) := N. 


+ N — n; 


This is not a closed form for Jafn); it’s not even a recurrence. But at least it 
tells us how to calculate the answer reasonably fast, if n is large. 


“Not too slow, 
not too fast.” 

— L. Armstrong 



3.3 FLOOR/CEILING RECURRENCES 81 


Fortunately there’s a way to simplify this algorithm if we use the variable 
D = 3n + 1 — N in place of N. (This change in notation corresponds to 
assigning numbers from 3n down to 1 , instead of from 1 up to 3n; it’s sort of 
like a countdown.) Then the complicated assignment to N becomes 


D := 3n+ 1 - 

= n + D - 


(3n + 1 — D) — n — 1 


2n — D 
2 


2 

= D- 


(3n + 1 — D) — n 


-D 

T" 


= D 


= m, 


and we can rewrite the algorithm as follows: 


D := 1; 

while D 2 n do D := |~|D] ; 
J 3 (n) := 3n + 1 — D . 


Aha! This looks much nicer, because u enters the calculation in a very simple 
way. In fact, we can show by the same reasoning that the survivor J q (n) when 
every qth person is eliminated can be calculated as follows: 


D := 1 ; 

while D^(q-l)n do D:=[^L-D]; ( 3 . 19 ) 

J q (n) := qn + 1 — D . 


In the case q = 2 that we know so well, this makes D grow to 2 m+1 when 
n = 2 m + l; hence J 2 (n) = 2(2 m + l) + 1 - 2 m+1 = 21 + 1 . Good. 

The recipe in (3.19) computes a sequence of integers that can be defined 
by the following recurrence: 


D' q) = 1 
D' q) 


q — 1 


D 


(q) 


for n > 0. 


( 3 - 20 ) 


“Known" like, say, 
harmonic numbers. 
A. U. Odlyzko and 
H.S. Wilfhave 
shown [283] that 

Dk 31 = L(|) n cj , 

where 

C « 1 .622270503. 


These numbers don’t seem to relate to any familiar functions in a simple 
way, except when q = 2; hence they probably don’t have a nice closed form. 
But if we’re willing to accept the sequence Dn 1 as “known,” then it’s easy to 
describe the solution to the generalized Josephus problem: The survivor J q (n) 
is qn + 1 — D{, q \ where k is as small as possible such that D[, q i > (q - 1)n. 

3.4 ‘MOD’: THE BINARY OPERATION 

The quotient of n divided by m is |_n/ mj , when m and n are positive 
integers. It’s handy to have a simple notation also for the remainder of this 
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division, and we call it ‘n mod m’. The basic formula 


n. = m |ji/mJ + rimod^ra 

quotient remainder 


tells us that we can express n mod m as n — mLn/mJ. We can generalize this 
to negative integers, and in fact to arbitrary real numbers: 

x mod y = x-y|x/y_|, for y ^ 0 . ( 3 . 21 ) 


This defines ‘mod’ as a binary operation, just as addition and subtraction are 
binary operations. Mathematicians have used mod this way informally for a 
long time, taking various quantities mod 1 0, mod 2n, and so on, but only in 
the last twenty years has it caught on formally. Old notion, new notation. 

We can easily grasp the intuitive meaning of xmody, when x and y 
are positive real numbers, if we imagine a circle of circumference y whose 
points have been assigned real numbers in the interval [0 . .y). If we travel a 
distance x around the circle, starting at 0, we end up at x mody. (And the 
number of times we encounter 0 as we go is \x/y \ .) 

When x or y is negative, we need to look at the definition carefully in 
order to see exactly what it means. Here are some integer-valued examples: 


5 mod 3 
5 mod —3 
—5 mod 3 
—5 mod —3 


5 — 3 L5/3J = 2; 

5 — (—3) |_5/(— 3)J = —1 ; 

—5 — 3 L — 5/3J = 1; 

—5 — (—3) L — 5/( — 3) J = —2 . 


Why do they call it 
‘mod’: The Binary 
Operation? Stay 
tuned to And out in 
the next, exciting, 
chapter! 


Beware of computer 
languages that use 
another definition. 


The number after ‘mod’ is called the modulus ; nobody has yet decided what 
to call the number before ‘mod’. In applications, the modulus is usually 
positive, but the definition makes perfect sense when the modulus is negative. 
In both cases the value of x mod y is between 0 and the modulus: 


How about calling 
the other number 
the modumor? 


0 ag x mod y < y , for y > 0; 

0 ^ x mod y > y , for y < 0. 

What about y = 0? Definition (3.21) leaves this case undefined, in order to 
avoid division by zero, but to be complete we can define 

x mod 0 — x. (3.22) 

This convention preserves the property that x mod y always differs from x by 
a multiple of y. (It might seem more natural to make the function continuous 
at 0, by defining x mod 0 = lim y ^o x mod y = 0. But we’ll see in Chapter 4 
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There was a time in 
the 70s when ‘mod’ 
was the fashion. 
Maybe the new 
mumble function 
should be called 
‘punk’? 

No — I Uke 
‘mumble’. 


The remainder, eh? 


that this would be much less useful. Continuity is not an important aspect 
of the mod operation.) 

We’ve already seen one special case of mod in disguise, when we wrote x 
in terms of its integer and fractional parts, x = L X J + {x}- The fractional part 
can also be written x mod 1 , because we have 

x = |xJ + * mod 1 . 

Notice that parentheses aren’t needed in this formula; we take mod to bind 
more tightly than addition or subtraction. 

The floor function has been used to define mod, and the ceiling function 
hasn’t gotten equal time. We could perhaps use the ceiling to define a mod 
analog like 

x mumble y = y[x/y] — x; 

in our circle analogy this represents the distance the traveler needs to continue, 
after going a distance x, to get back to the starting point 0. But of course 
we’d need a better name than ‘mumble’. If sufficient applications come along, 
an appropriate name will probably suggest itself. 

The distributive law is mod’s most important algebraic property: We 

have 


c(xmody) = (cx) mod (cy) (3.23) 

for all real c, x, and y. (Those who like mod to bind less tightly than multi- 
plication may remove the parentheses from the right side here, too.) It’s easy 
to prove this law from definition (3.21), since 

c(xmody) = c(x — y|_x/yj) = cx — cy[cx/cyj = cx mod cy , 

if cy 7^ 0; and the zero-modulus cases are trivially true. Our four examples 
using ±5 and ±3 illustrate this law twice, with c = — 1. An identity like 
(3.23) is reassuring, because it gives us reason to believe that ‘mod’ has not 
been defined improperly. 

In the remainder of this section, we’ll consider an application in which 
‘mod’ turns out to be helpful although it doesn’t play a central role. The 
problem arises frequently in a variety of situations: We want to partition 
n things into m groups as equally as possible. 

Suppose, for example, that we have n short lines of text that we’d like 
to arrange in m columns. For aesthetic reasons, we want the columns to be 
arranged in decreasing order of length (actually nonincreasing order); and the 
lengths should be approximately the same — no two columns should differ by 
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more than one line’s worth of text. If 37 lines of text are being divided into 
five columns, we would therefore prefer the arrangement on the right: 


8 8 8 8 5 


8 8 7 7 7 


line 1 
line 2 
line 3 
line 4 
line 5 
line 6 
line 7 
line 8 


line 9 
line 10 
line 1 1 
line 12 
line 13 
line 14 
line 15 
line 16 


line 17 
line 18 
line 19 
line 20 
line 21 
line 22 
line 23 
line 24 


line 25 
line 26 
line 27 
line 28 
line 29 
line 30 
line 31 
line 32 


line 33 
line 34 
line 35 
line 36 
line 37 


line 1 
line 2 
line 3 
line 4 
line 5 
line 6 
line 7 
line 8 


line 9 
line 10 
line 11 
line 12 
line 13 
line 14 
line 15 
line 16 


line 17 
line 18 
line 19 
line 20 
line 21 
line 22 
line 23 


line 24 
line 25 
line 26 
line 27 
line 28 
line 29 
line 30 


line 31 
line 32 
line 33 
line 34 
line 35 
line 36 
line 37 


Furthermore we want to distribute the lines of text columnwise — first decid- 
ing how many lines go into the first column and then moving on to the second, 
the third, and so on — because that’s the way people read. Distributing row 
by row would give us the correct number of lines in each column, but the 
ordering would be wrong. (We would get something like the arrangement on 
the right, but column 1 would contain lines 1, 6, 11, . . . , 36, instead of lines 
1, 2, 3, . . . , 8 as desired.) 

A row-by-row distribution strategy can’t be used, but it does tell us how 
many lines to put in each column. If n is not a multiple of m, the row- 
by-row procedure makes it clear that the long columns should each contain 
pn./m] lines, and the short columns should each contain LTL/mJ. There will 
be exactly n mod m long columns (and, as it turns out, there will be exactly 
n mumble m short ones) . 

Let’s generalize the terminology and talk about ‘things’ and ‘groups’ 
instead of ‘lines’ and ‘columns’. We have just decided that the first group 
should contain [n/m] things; therefore the following sequential distribution 
scheme ought to work: To distribute n things into m groups, when m > 0, 
put [n/m] things into one group, then use the same procedure recursively to 
put the remaining n/ = n — |~n/m] things into m' = m— 1 additional groups. 

For example, if n = 314 and m = 6, the distribution goes like this: 

remaining things remaining groups [things/groups] 


314 

6 

53 

261 

5 

53 

208 

4 

52 

156 

3 

52 

104 

2 

52 

52 

I 

52 


It works. We get groups of approximately the same size, even though the 
divisor keeps changing. 

Why does it work? In general we can suppose that n = qm + r, where 
q = |n/m| and r = n mod m. The process is simple if r = 0: We put 
pri/m] = q things into the first group and replace n by n' = n - q, leaving 
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n' = qm' things to put into the remaining m/ = m — 1 groups. And if 
r > 0, we put [n/m"| = q + 1 things into the first group and replace n 
by n' = n - q - 1, leaving n/ = qm' + r — 1 things for subsequent groups. 
The new remainder is r' = r — 1 , but q stays the same. It follows that there 
will be r groups with q + 1 things, followed by m-r groups with q things. 

How many things are in the kth group? We’d like a formula that gives 
|~u/m.~| when k ^ nmodm, and LiVm] otherwise. It’s not hard to verify 
that 

"n — k + 1" 
m 


has the desired properties, because this reduces to q + [(r — k + 1 )/m] if we 
write n = qm + r as in the preceding paragraph; here q = [n/mj • We have 
f (r — k + l)/m] = [k^r], if 1 ^k^m and 0 ^ r < m. Therefore we can 
write an identity that expresses the partition of n into m as-equal-as-possible 
parts in nonincreasing order: 


n 


n 

m 


n — 1 
m 


n — m + 1 
m 


(3-24) 


This identity is valid for all positive integers m, and for all integers n (whether 
positive, negative, or zero). We have already encountered the case m = 2 in 
(3.17), although we wrote it in a slightly different form, n = pn/ 2 ] + |_ti/ 2 J. 

If we had wanted the parts to be in nondecreasing order, with the small 
groups coming before the larger ones, we could have proceeded in the same 
way but with [n/mj things in the first group. Then we would have derived 
the corresponding identity 


n 

+ 

n + 1 


■ • + 

n + m — 1 






m 



m 


(3-25) 


It’s possible to convert between (3.25) and (3.24) by using either (3.4) or the 
identity of exercise 12. 

Some claim that it’s Now if we replace n in (3.25) by LttvxJ , and apply rule (3.11) to remove 

too dangerous to floors inside of floors, we get an identity that holds for all real x: 
replace anything by 

I 1 I I „ 1 I 

(3.26) 


an mx. 

1 


m — 1 

+ 

X 

II 

1 

x+ — 

+ ••• + 

x T 


m_ 


m 


This is rather amazing, because the floor function is an integer approximation 
of a real value, but the single approximation on the left equals the sum of a 
bunch of them on the right. If we assume that [xj is roughly x - | on the 
average, the left-hand side is roughly mx— while the right-hand side comes 

to roughly (x— j) + (x— \ + y^)~\ k(x— j + = mx — the sum 

of all these rough approximations turns out to be exact! 
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3.5 FLOOR/CEILING SUMS 

Equation (3.26) demonstrates that it’s possible to get a closed form 
for at least one kind of sum that involves |_ J • Are there others? Yes. The 
trick that usually works in such cases is to get rid of the floor or ceiling by 
introducing a new variable. 

For example, let’s see if it’s possible to do the sum 

L Lv^J 

0$k<tt 


in closed form. One idea is to introduce the variable m = [’'/kj ; we can do 
this “mechanically” by proceeding as we did in the roulette problem: 

m[k < n] [m = ] 

k,m^0 

m[k<n] [m^ y/k< m + l] 

k,m^0 

m[k<n] [m 2 ^k< (m + I) 2 ] 

k,m^0 

m[m 2 ^k< (m+ l) 2 ^n] 

k,m^0 

+ ^ kn[m 2 ^k<n< (m+ l) 2 ] . 

k,m^0 

Once again the boundary conditions are a bit delicate. Let’s assume first that 
n. = a 2 is a perfect square. Then the second sum is zero, and the first can be 
evaluated by our usual routine: 

m[m 2 $;k< (m+ 1) 2 ^q 2 ] 

k,m^0 

= m((m + 1 ) 2 — m 2 ) [m + 1 ^ a] 

m3>0 

= ^ m(2m+l)[m<a] 

m^O 

= ^ (2m- + 3m-)[iu<a] 

vrC>Q 

= y~ (2m- + 3 md) 6m 

= |a(a — 1 ) (a — 2 ) + §a(a — 1 ) = l( 4 a + 1 )a(a - 1 ) 


Falling powers 
make the sum come 
tumbling down. 


£_ L^kj = 

0^k<n 
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In the general case we can let a = [v^hl ! then we merely need to add the 
terms for a 2 ^ k < n, which are all equal to a, so they sum to (n — a 2 ) a. 
This gives the desired closed form, 

Y_ [Vk\ = na- la 3 - la 2 - 1q, a=|_\/nj. ( 3 . 27 ) 

OsCkcn 


Another approach to such sums is to replace an expression of the form 
|xj by £T[1 this is legal whenever x ^ 0. Here’s how that method 

works in the sum of [square roots] , if we assume for convenience that n = a 2 : 


L LVicj 

O^kcn 


Y_ t 1 ^j^v / k][0^k<Q 2 ] 

j,k 

L L [j 2 ^k<a 2 ] 

1^j<a k 

^ (a 2 -j 2 ) = a 3 -la(a+l)(a+l). 

1 ^ j<a 


Now here’s another example where a change of variable leads to a trans- 
formed sum. A remarkable theorem was discovered independently by three 
mathematicians — Bohl [34], Sierpinski [326], and Weyl [368] — at about the 
same time in 1909: If a is irrational then the fractional parts {not} are very uni- 
formly distributed between 0 and 1 , as n — » 00 . One way to state this is that 


lim — 

TL — >00 U 


Y_ f({ka}) 

0$k<n 


fl 

0 


f(x) dx 


( 3 - 28 ) 


Warning: This stuff 
is fairly advanced. 
Better skim the 
next two pages on 
first reading ; they 
aren’t crucial. 

— Friendly TA 


Start 

Skimming 


for all irrational a and all functions f that are continuous almost everywhere. 
For example, the average value of {rux} can be found by setting f (x) = x; we 
get j. (That’s exactly what we might expect; but it’s nice to know that it is 
really, provably true, no matter how irrational tx is.) 

The theorem of Bohl, Sierpinski, and Weyl is proved by approximating 
f(x) above and below by “step functions,” which are linear combinations of 
the simple functions 

f v (x) = [0^x<v] 

when 0 ^ v 1 . Our purpose here is not to prove the theorem; that’s a job 
for calculus books. But let’s try to figure out the basic reason why it holds, 
by seeing how well it works in the special case f(x) = f v (x). In other words, 
let’s try to see how close the sum 


Y_ [{ka} < v] 

0<]k<n 


gets to the “ideal” value nv, when rt is large and oc is irrational. 
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For this purpose we define the discrepancy D(oc, n) to be the maximum 
absolute value, over all 0 ^ v ^ 1 , of the sum 

s(a,n,v) = Y_ ([0<oc}<v] -v) . (3.29) 

0$k<n 

Our goal is to show that D ( a, n) is “not too large” when compared with n, 
by showing that |s(a, n,v)| is always reasonably small when oc is irrational. 

First we can rewrite s(cx, u,v) in simpler form, then introduce a new 
index variable j: 

(j{koc}<v] — v^) = ^ (L^ocJ ~ L kc * - V J ^ v ) 

O^kcn 0^k<n 

= — nv + ^ ^ [koc — v < j ^ koc] 

0^k<n j 

= — nv+ y~ [jou 1 ^k< (j + vjou 1 ] . 

0^j<|’na"| k <n 


If we’re lucky, we can do the sum on k. But we ought to introduce some 
new variables, so that the formula won’t be such a mess. Without loss of 
generality, we can assume that 0 < a < 1 ; let us write 


a = [oc 1 j , oc 1 = a + a' ; 

b = [vcU 1 ] , voU 1 = b — v' . 


Thus oc' = {0U 1 } is the fractional part of 0U 1 , and v' is the mumble-fractional 
part of voU 1 . 

Once again the boundary conditions are our only source of grief. For 
now, let’s forget the restriction ‘k < n’ and evaluate the sum on k without it: 


X[ kG [joe 1 .. (j +v)a ’) 
k 


f(i +v)(a + a')] - [j(a + a')l 
b+Tja'— v'l-fja'l. 


OK, that’s pretty simple; we plug it in and plug away: 

s(a, n,v) = — nv+pn.oc]b + Y_ ([ja'-v'] - [joe']) - S , (3.30) 

0^j< [not] 

where S is a correction for the cases with k % n that we have failed to exclude. 
The quantity joc' will never be an integer, since oc (hence oc') is irrational; and 
joc' — v' will be an integer for at most one value of j. So we can change the 


Right, name and 
conquer. 

The change of vari- 
able from k to j is 
the main point. 

— Friendly TA 
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(The formula 
{ 0 or 1 } stands 
for something that’s 
either 0 or 1 ; we 
needn’t commit 
ourselves, because 
the details don’t 
really matter.) 


Stop 

Skimming 


ceiling terms to floors: 

s(a,n,v) = — nv + prux] b — ^ ([ ja/J — [jot'— v'J) — S + {0 or 1} . 

o<; j < [noe] 


Interesting. Instead of a closed form, we’re getting a sum that looks rather 
like s(a, n,v) but with different parameters: a' instead of a, pnac] instead 
of u, and v' instead of v. So we’ll have a recurrence for s(a,n,v), which 
(hopefully) will lead to a recurrence for the discrepancy D(ct,n). This means 
we want to get 

s(a',H,v') = Y_ (U a 'J - U a ' ~ V 'J ~ v ') 

0^j<[na] 


into the act: 

s(a,n,v) = — nv + pna]b — [rux]v' — s(a', pna] , v') — S + {0 or 1} . 

Recalling that b — v' = voU 1 , we see that everything will simplify beautifully 
if we replace [net] (b — v') by nct(b — v') = nv: 

s(a,n,v) = — s(cx', pna] , v') — S + e + (0 or 1} . 

Here e is a positive error of at most vcU 1 . Exercise 18 proves that S is, simi- 
larly, between 0 and [voU 1 ] . And we can remove the term for ) = pnoc] — 1 = 
[naj from the sum, since it contributes either v' or v' — 1. Hence, if we take 
the maximum of absolute values over all v, we get 

D(a,n) ^ D(a', [ocrij ) + a -1 + 2 . (3.31) 

The methods we’ll learn in succeeding chapters will allow us to conclude 
from this recurrence that D ( a, n) is always much smaller than n, when n is 
sufficiently large. Hence the theorem (3.28) is not only true, it can also be 
strengthened: Convergence to the limit is very fast. 

Whew; that was quite an exercise in manipulation of sums, floors, and 
ceilings. Readers who are not accustomed to “proving that errors are small” 
might find it hard to believe that anybody would have the courage to keep 
going, when faced with such weird-looking sums. But actually, a second look 
shows that there’s a simple motivating thread running through the whole 
calculation. The main idea is that a certain sum s ( a, n, v) of n terms can be 
reduced to a similar sum of at most [ an] terms. Everything else cancels out 
except for a small residual left over from terms near the boundaries. 

Let’s take a deep breath now and do one more sum, which is not trivial 
but has the great advantage (compared with what we’ve just been doing) that 
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it comes out in closed form so that we can easily check the answer. Our goal 
now will be to generalize the sum in (3.26) by finding an expression for 


L 

0<^k<m 


nk + x 


m 


integer m > 0, integer n. 


Finding a closed form for this sum is tougher than what we’ve done so far 
(except perhaps for the discrepancy problem we just looked at). But it’s 
instructive, so we’ll hack away at it for the rest of this chapter. 

As usual, especially with tough problems, we start by looking at small 
cases. The special case n = 1 is (3.26), with x replaced by x/m: 


X 

+ 

1 +x 


• + 

m — 1 + x 




LmJ 


m 



m 


And as in Chapter 1, we find it useful to get more data by generalizing 
downwards to the case n = 0: 



Our problem has two parameters, m and n; let’s look at some small cases 
for m. When m = 1 there’s just a single term in the sum and its value is [xj • 
When m = 2 the sum is [x/2\ + |_(x + n) /2J . We can remove the interaction 
between x and n by removing n from inside the floor function, but to do that 
we must consider even and odd n separately. If n is even, n/2 is an integer, 
so we can remove it from the floor: 




+ 


n 
2 ' 


If n is odd, (n — 1 )/2 is an integer so we get 


K 


n 


= w + 


n- 1 


The last step follows from (3.26) with m = 2. 

These formulas for even and odd n slightly resemble those for n = 0 and 1 , 
but no clear pattern has emerged yet; so we had better continue exploring 
some more small cases. For m = 3 the sum is 



x + n 
3 


x T 2n 
3 


and we consider three cases for n: Either it’s a multiple of 3, or it’s 1 more 
than a multiple, or it’s 2 more. That is, n mod 3 = 0, 1 , or 2. If n mod 3 = 0 


Is this a harder sum 
of floors, or a sum 
of harder floors? 


Be forewarned: This 
is the beginning of 
a pattern, in that 
the last part of the 
chapter consists 
of the solution of 
some long, difficult 
problem, with little 
more motivation 
than curiosity. 

— Students 

Touche. But c’mon, 
gang, do you always 
need to be told 
about applications 
before you can get 
interested in some- 
thing? This sum 
arises, for example, 
in the study of 
random number 
generation and 
testing. But math- 
ematicians looked 
at it long before 
computers came 
along, because they 
found it natural to 
ask if there’s a way 
to sum arithmetic 
progressions that 
have been “floored.’’ 

— Your instructor 
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then n/3 and 2n/3 are integers, so the sum is 


/ x n\ ( x 2n\ „ x 

Mid+iMW+Tj = 3 U 


x 1 

x *n. 


If n mod 3 = 1 then (n — 1 )/3 and (2n — 2)/3 are integers, so we have 


H 


x+1 


+ 


n- 1 


x + 2 


2n — 2 


= [xj + n 1 . 


Again this last step follows from (3.26), this time with m = 3. And finally, if 
n mod 3=2 then 


M 


X + 2 


+ 


n — 2 


x+1 


2n — 1 


= |xj + n - 1 . 


“Inventive genius 
requires pleasurable 
mental activity as 
a condition for its 
vigorous exercise. 
‘Necessity is the 
mother of invention’ 
is a silly proverb. 
‘Necessity is the 
mother of futile 
dodges’ is much 
nearer to the truth. 
The basis of the 
growth of modern 
invention is science, 
and science is al- 
most wholly the 
outgrowth of plea- 
surable intellectual 
curiosity.” 

—A.N. White- 
head [371] 


The left hemispheres of our brains have finished the case m = 3, but the 
right hemispheres still can’t recognize the pattern, so we proceed to m = 4: 


d + 


■ n 


+ 


■ 2n 


3n 


At least we know enough by now to consider cases based on n. mod m. If 
n mod 4 = 0 then 



And if n mod 4 = 1, 



The case n mod 4 = 3 turns out to give the same answer. Finally, in the case 
n mod 4 = 2 we get something a bit different, and this turns out to be an 
important clue to the behavior in general: 



This last step simplifies something of the form [y/2\ + |_(y + 1)/2J, which 
again is a special case of (3.26). 
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To summarize, here’s the value of our sum for small m: 


m 


n mod m = 0 n mod m = 1 


1 |x 


X 

n 

T — 

w 

n 1 

+ — — ~ 

.2. 

2 

2 2 

X 


w 

+ n -1 

-3. 

+ n 

X 

3n 

H — — 

w 

3n ; 
H — : 

-4. 

2 

2 


n mod m. = 2 


[xj + n - 1 




u mod m = 3 



3 

2 


It looks as if we’re getting something of the form 


a 


x 

-d. 


+ bn + c , 


where a, b, and c somehow depend on m and n. Even the myopic among 
us can see that b is probably (m — 1 )/2. It’s harder to discern an expression 
for a; but the case n mod 4 = 2 gives us a hint that a is probably gcd(m, n), 
the greatest common divisor of m and n. This makes sense because gcd(m, n) 
is the factor we remove from m and n when reducing the fraction n/m to 
lowest terms, and our sum involves the fraction n/m. (We’ll look carefully 
at gcd operations in Chapter 4.) The value of c seems more mysterious, but 
perhaps it will drop out of our proofs for a and b. 

In computing the sum for small m, we’ve effectively rewritten each term 
of the sum as 


x + kn 


x + kn mod m 

kn 

4“ 

kn mod m 

m 


m 

m. 

m 


because (kn — kn mod m)/m is an integer that can be removed from inside 
the floor brackets. Thus the original sum can be expanded into the following 
tableau: 


+ 


+ 


x 
-m 

x + n mod m 
m 

x + 2n mod m 
m 


0 

+ — 
m 


n 

+ — 
m 


2n 

+ — 
m 


0 mod m 
m 

n mod m 
m 

2n mod m 
m 


x + (m — 1 )n mod m 


(m— 1 )n (m-l)nmodm 


+ 


m 


m 


m 
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Lemma now, 
dilemma later. 


When we experimented with small values of m, these three columns led re- 
spectively to Q[x/aJ, bn, and c. 

In particular, we can see how b arises. The second column is an arithmetic 
progression, whose sum we know — it’s the average of the first and last terms, 
times the number of terms: 


1 

2 



(m — 1 )n 


m 


• m 


(m — 1 )n 

2 


So our guess that b = ( m — 1 )/2 has been verified. 

The first and third columns seem tougher; to determine a and c we must 
take a closer look at the sequence of numbers 


0 mod m, n mod m, 2n mod m, ..., (m-l)nmod m. 


Suppose, for example, that m = 12 and n = 5. If we think of the 
sequence as times on a clock, the numbers are 0 o’clock (we take 12 o’clock 
to be 0 o’clock), then 5 o’clock, 10 o’clock, 3 o’clock (=15 o’clock), 8 o’clock, 
and so on. It turns out that we hit every hour exactly once. 

Now suppose m = 12 and u = 8. The numbers are 0 o’clock, 8 o’clock, 
4 o’clock (= 16 o’clock), but then 0, 8, and 4 repeat. Since both 8 and 12 are 
multiples of 4, and since the numbers start at 0 (also a multiple of 4), there’s 
no way to break out of this pattern — they must all be multiples of 4. 

In these two cases we have gcd( 12,5) = 1 and gcd( 12,8) =4. The general 
rule, which we will prove next chapter, states that if d = gcd(m, n) then we 
get the numbers 0, d, 2d, . . . , m — d in some order, followed by d — 1 more 
copies of the same sequence. For example, with m = 1 2 and n = 8 the pattern 
0, 8, 4 occurs four times. 

The first column of our sum now makes complete sense. It contains 
d copies of the terms [x/mj, |_(x + d)/mj, . . . , [(x + m — d)/mj, in some 
order, so its sum is 


(isJ 


= d 


= d 


x + d 


m 


+ -■ + 




m — d 
m 


x/d 

_|_ 

x/d + 1 

- 1 - • < 

, . _|_ 

x/d + m/d — 1 

m/d_ 

1 

m/d 

1 

\ 

m/d 


This last step is yet another application of (3.26). Our guess for a has been 
verified: 


a 


d = gcd(m,n) . 
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Also, as we guessed, we can now compute c, because the third column 
has become easy to fathom. It contains d copies of the arithmetic progression 
0/m, d/m, 2d/m, . . . , (m — d)/m, so its sum is 


d 




m — d 
m 



m — d 
2 ’ 


the third column is actually subtracted, not added, so we have 
d — m 


End of mystery, end of quest. The desired closed form is 


L 

0<^k<m 


nk + x 
m 



m — 1 
2 


n + 


d — m 
2 


where d = gcd(m, n). As a check, we can make sure this works in the special 
cases n = 0 and n = 1 that we knew before: When n = 0 we get d = 
gcd(m, 0) = m; the last two terms of the formula are zero so the formula 
properly gives m[x/mj. And for n = 1 we get d = gcd(m, 1) = 1; the last 
two terms cancel nicely, and the sum is just |_xj . 

By manipulating the closed form a bit, we can actually make it symmetric 
in m and n: 


L 


nk + x 


0<k<m L 


m 


= d 

= d 

= d 


n + 


- 

dJ 

-I 

dJ 


m — 1 


-n + 


d — m 


,m— l)(n— 1) m— 1 d — m 
, I + j - + — 2 — + 

m — 1 )(n — 1 ) d — 1 


x 

- + 


Ld 


( 3 - 32 ) 


This is astonishing, because there’s no algebraic reason to suspect that such 
a sum should be symmetrical. We have proved a “reciprocity law,” 


L 


O^kcm L 


nk + x 

_ v 

mk + x 

m 

_ 2_ 

OsCkcn 

n 


integers m, n > 0. 


For example, if m = 41 and n = 127, the left sum has 41 terms and the right 
has 127; but they still come out equal, for all real x. 


Yup, I’m floored. 
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Exercises 

Warmups 

1 When we analyzed the Josephus problem in Chapter 1, we represented 
an arbitrary positive integer n in the form n = 2 m + 1, where 0 ^ l < 2 m . 
Give explicit formulas for l and m as functions of n, using floor and/or 
ceiling brackets. 

2 What is a formula for the nearest integer to a given real number x? In case 
of ties, when x is exactly halfway between two integers, give an expression 
that rounds (a) up — that is, to [x~|; (b) down — that is, to [xj- 

3 Evaluate [LmaJu/aJ, when m and n are positive integers and a is an 
irrational number greater than n. 

4 The text describes problems at levels 1 through 5. What is a level 0 
problem? (This, by the way, is not a level 0 problem.) 

5 Find a necessary and sufficient condition that [n-xj = nlpcj , when n is a 
positive integer. (Your condition should involve {x}.) 

6 Can something interesting be said about |_f (x)J when f (x) is a continuous, 
monotonically decreasing function that takes integer values only when 
x is an integer? 

7 Solve the recurrence 

X n = n , for 0 ^ n < m; 

X n = X n _ m + 1 , for n ^ m. 

8 Prove the Dirichlet box principle: If n objects are put into m boxes, 
some box must contain fn/m] objects, and some box must contain 
^ Ln/mJ. 

9 Egyptian mathematicians in 1800 B.C. represented rational numbers be- 
tween 0 and 1 as sums of unit fractions 1 /xi + ■ ■ ■ + 1 /xk, where the x’s 
were distinct positive integers. For example, they wrote 4- + instead 
of |. Prove that it is always possible to do this in a systematic way: If 
0 < m/n < 1 , then 

m If . m 1 1 r n ' 

— = — |- < representation of >, q = — . 

n q ( n q J m 

(This is Fibonacci’s algorithm , due to Leonardo Fibonacci, A.D. 1202.) 


You know you’re 
in college when the 
book doesn ’t tell 
you how to pro- 
nounce ‘Dirichlet’. 
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Basics 

10 Show that the expression 

"2x + 1 ] [2x + 1 ] | 2x + 1 

2 _ 4 + 4 

is always either [xj or [x]. In what circumstances does each case arise? 

11 Give details of the proof alluded to in the text, that the open interval 
(a. . (3) contains exactly [(3] — |_ a J ~ 1 integers when a < |3. Why does 
the case tx = (3 have to be excluded in order to make the proof correct? 

1 2 Prove that 

■ n "| n + m — 1 
m m 

for all integers n and all positive integers m. [This identity gives us 
another way to convert ceilings to floors and vice versa, instead of using 
the reflective law ( 3 . 4 ).] 

13 Let a and (3 be positive real numbers. Prove that Spec(a) and Spec(|3) 
partition the positive integers if and only if a and |3 are irrational and 
l/cc+1/13 = 1. 

14 Prove or disprove: 

(xmodny)mody = x mod y , integer n. 

15 Is there an identity analogous to ( 3 . 26 ) that uses ceilings instead of floors? 

16 Prove that n mod 2 = (l — (— 1 ) n ) /2. Find and prove a similar expression 
for n mod 3 in the form a+bcu n + ctu 2n , where cu is the complex number 
(—1 + ix/3 )/2. Hint: cu 3 = 1 and 1 + cu + cu 2 = 0. 

17 Evaluate the sum Ho<k<mL x + k/mj in the case x ^ 0 by substituting 

[1 ^ j ^ x + k/m] for [x + k/mj and summing first on k. Does your 
answer agree with ( 3 . 26 )? 

18 Prove that the boundary- value error term S in ( 3 . 30 ) is at most 
Hint: Show that small values of j are not involved. 

Homework exercises 

19 Find a necessary and sufficient condition on the real number b > 1 such 
that 

Llog b xJ = [logbWj 


for all real x ^ 1 . 
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20 Find the sum of all multiples of x in the closed interval [a. . (3], when 
x > 0. 

21 How many of the numbers 2 m , for 0 ^ m ^ M, have leading digit 1 in 
decimal notation? 

22 Evaluate the sums S n = }^ k>1 [n/ 2 k + 2 j andT n = )T k>1 2 k [n/2 k +j\ 2 . 

23 Show that the nth element of the sequence 

1 22 333444455555 

is [\/2n+ 2 J • (The sequence contains exactly m occurrences of m.) 

24 Exercise 13 establishes an interesting relation between the two multisets 
Spec(cx) and Spec(<x/(a— 1)), when a is any irrational number > 1, 
because 1/a + (oc — 1 )/a = 1 . Find (and prove) an interesting relation 
between the two multisets Spec(a) and Spec(a/(a + 1 )), when a is any 
positive real number. 

25 Prove or disprove that the Knuth numbers, defined by ( 3 . 16 ), satisfy 
K n n for all nonnegative n. 

26 Show that the auxiliary Josephus numbers ( 3 . 20 ) satisfy 

(A) 4 D "’ 55 “(A) ■ fornS0 - 

27 Prove that infinitely many of the numbers defined by ( 3 . 20 ) are 

even, and that infinitely many are odd. 

28 Solve the recurrence 


There’s a discrep- 
ancy between this 
formula and (3.31) 

X 0 = m, 

X n = X 2 _, — 2 , for n > 0, 

has the solution X n = [a 2 '] , if m is an integer greater than 2, where 
a + a -1 = m and a > 1 . For example, if m = 3 the solution is 

v r,*,2 n + ' i . 1 + V5 2 

Xn = W 1- 4> = — 2 — ’ a = $ ■ 


do = 1 ; 

Qn = CLn-1 + L\/ a n— 1 J , for TL > 0. 

29 Show that, in addition to ( 3 . 31 ), we have 

D(a,n) :> D(a', L an J) — ^ 2 . 

30 Show that the recurrence 



98 INTEGER FUNCTIONS 


31 Prove or disprove: [xj + |_pj + L x + Pj ^ l^- x \ + L^pJ • 

32 Let ||x|| = min(x — |_xj , |"x] — x) denote the distance from x to the nearest 
integer. What is the value of 

^2 k ||x/2 k || 2 ? 

k 

(Note that this sum can be doubly infinite. For example, when x = 1/3 
the terms are nonzero as k — > — oo and also as k — > +oo.) 

Exam problems 

33 A circle, 2n — 1 units in diameter, has been drawn symmetrically on a 
2n x 2n chessboard, illustrated here for n = 3: 



a How many cells of the board contain a segment of the circle? 

b Find a function f(n, k) such that exactly Y. k- 1 ' f(ir, k) cells of the 

board lie entirely within the circle. 

34 Let f(n) = ££ =1 flgk]. 

a Find a closed form for f(n), when n 1. 
b Prove that f (n) = n — 1 + f ( pn/2] ) + f ( [n/2] ) for all n. 1 . 

35 Simplify the formula [(n + 1 ) 2 n! ej mod n. 

36 Assuming that n is a nonnegative integer, find a closed form for the sum 


L 

1 <k<2 2Tl 


1 

2 L'g k J 4 L'g k J 


37 Prove the identity 

L 


0</k<m 


(\ 

m+k 

-h - 

m 2 


2 

min(m mod n, (—m) mod n)“ 

u 

L n J 

U) - 

n 


n 


for all positive integers m and n. 

38 Let xi , . . . , x n be real numbers such that the identity 


n 

Y L mx kJ 

k=1 


nr Y x k 

k 1^k$n 2 


holds for all positive integers m. Prove something interesting about 
x i, . . ., x n . 


Simplify it, but 
don ’t change the 
value. 
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39 Prove that the double sum Ho<k<iog b x Z^o<j<b \ ( x + jb k )/b k+1 ] equals 
(b — 1 ) ( [log b xj + 1 ) + |xl — 1 1 for every real number x ^ 1 and every 
integer b > 1 . 

40 The spiral function cr(n), indicated in the diagram below, maps a non- 
negative integer n onto an ordered pair of integers (x(n),y(n)). For 
example, it maps n = 9 onto the ordered pair (1,2). 


People in the south- 
ern hemisphere use 
a different spiral. 


x 


a Prove that if m = [v^J > 

x(n) = (-1) m ^(n-m(m+ 1)) • [[2-v/nJ is even] + , 

and find a similar formula for y(n). Hint: Classify the spiral into 
segments Wk, Sk, Ek, Nk according as =4k — 2, 4k — 1, 4k, 

4k + 1. 

b Prove that, conversely, we can determine n from cr(n) by a formula 
of the form 

n = (2k) 2 ± (2k + x(n) + y(n)) , k = max(|x(n)|, |y(n)|). 

Give a rule for when the sign is + and when the sign is — . 

Bonus problems 

41 Let f and g be increasing functions such that the sets {f(1 ), f (2), . . . } and 
{g(1 ), g (2) , . . . } partition the positive integers. Suppose that f and g are 
related by the condition g(n) = f(f(n)) + 1 for all n > 0. Prove that 
f(n) = |n4J an d 9( n ) = L tl 4 )2 J j where 4> = (1 + \/5)/2. 

42 Do there exist real numbers oc, (3 , and y such that Spec (a), Spec( (3 ) , and 
Spec(y) together partition the set of positive integers? 
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43 

44 


Find an interesting interpretation of the Knuth numbers, by unfolding 
the recurrence ( 3 . 16 ). 

Show that there are integers a[, q 1 and d|t q 1 such that 


a' q) = 


D 


(q) 

n— 1 


j(q) 


D 


(q) 


q ( q ) 


q -1 


for n > 0 , 


when D i q 1 is the solution to ( 3 . 20 ). Use this fact to obtain another form 
of the solution to the generalized Josephus problem: 

Jq(n) = 1 +dj c q) +q(n-a{ c q) ), for a{ c q) ^ n < . 

45 Extend the trick of exercise 30 to find a closed-form solution to 


Y 0 = m, 

Y n = 2Y^_! - 1 , for n > 0, 
if m is a positive integer. 

46 Prove that if n = [(\/2 + \fl ) mj , where m and l are nonnegative 

integers, then [i v /2n(n + 1 )J = [(\/2 + -\-y/l )mj. Use this remarkable 
property to find a closed form solution to the recurrence 

Lo = a , integer a > 0 ; 

L n = [ v /2L n-i (L n -i + 1) J , for n > 0. 

Hint: \ _y / 2n(n + 1)J = [y/2{n+ j)\. 

47 The function f(x) is said to be replicative if it satisfies 

f(mx) = f!x) I ('fx I ' W-,---- | f(x • — 

V m/ V m / 

for every positive integer m. Find necessary and sufficient conditions on 
the real number c for the following functions to be replicative: 
a f(x) = x + c. 
b f(x) = [x + c is an integer], 
c f(x) = max([xj,c). 
d f(x) = x + c[xj — ^[x is not an integer]. 

48 Prove the identity 

x 3 = 3x[x|_xj J + 3{x}{x[xJ } + {x } 3 - 3[xJ [x [xj J + ]xj 3 , 

and show how to obtain similar formulas for x n when n > 3. 
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Spec this be hard. 


49 Find a necessary and sufficient condition on the real numbers 0 5^ a < 1 
and |3 ^ 0 such that we can determine a and |3 from the infinite multiset 
of values 

{ [naj + |u|3J | n > 0 } . 

Research problems 

50 Find a necessary and sufficient condition on the nonnegative real numbers 
a and |3 such that we can determine a and (3 from the infinite multiset 
of values 

{ L LtlocJ ( 3j | n> 0} . 

51 Let x be a real number cj? = ^(1 +>/5)- The solution to the recurrence 

Z 0 (x) = x, 

Z n (x) = Zn-Tixf-I, for TL > 0, 
can be written Z n (x) = [f(x) 2 H , if x is an integer, where 

f(x) = lim Z n (x) 1/2 , 

n— »oo 

because Z n (x) 1 < f(x) 2Tl < Z n (x) in that case. What other interesting 
properties does this function f(x) have? 

52 Given nonnegative real numbers a and (3, let 

Spec(cx;|3) = { [a + |3J , [2a + (3J , [3a: + |3J , . . . } 

be a multiset that generalizes Spec (a) = Spec(a;0). Prove or disprove: 
If the m 3 multisets Specfai ; (3 1 ), Spec(oc. 2 ; (32 ) , •••) Spec(a m ; |3 m ) 
partition the positive integers, and if the parameters ocj < ocz < ■ ■ ■ < ct m 
are rational, then 

2 m - 1 

ak = 2 i<-i ’ for 1 ^ k ^ m. 

53 Fibonacci’s algorithm (exercise 9) is “greedy” in the sense that it chooses 
the least conceivable q at every step. A more complicated algorithm is 
known by which every fraction m/n with n odd can be represented as a 
sum of distinct unit fractions 1/qi H — • + l/q^ with odd denominators. 
Does the greedy algorithm for such a representation always terminate? 
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Number Theory 


INTEGERS ARE CENTRAL to the discrete mathematics we are emphasiz- 
ing in this book. Therefore we want to explore the theory of numbers, an 
important branch of mathematics concerned with the properties of integers. 

We tested the number theory waters in the previous chapter, by intro- 
ducing binary operations called ‘mod’ and ‘gcd’. Now let’s plunge in and 
really immerse ourselves in the subject. 


4.1 DIVISIBILITY 

We say that m divides n (or n is divisible by m) if m > 0 and the 
ratio n/m is an integer. This property underlies all of number theory, so it’s 
convenient to have a special notation for it. We therefore write 

m\n m > 0 and n = mk for some integer k . (4.1) 

(The notation ‘m|n’ is actually much more common than ‘m\n’ in current 
mathematics literature. But vertical lines are overused — for absolute val- 
ues, set delimiters, conditional probabilities, etc. — and backward slashes are 
underused. Moreover, ‘m\n’ gives an impression that m is the denominator of 
an implied ratio. So we shall boldly let our divisibility symbol lean leftward.) 

If m. does not divide n we write ‘m\n’. 

There’s a similar relation, “n is a multiple of m,” which means almost 
the same thing except that m doesn’t have to be positive. In this case we 
simply mean that n = mk for some integer k. Thus, for example, there’s only 
one multiple of 0 (namely 0), but nothing is divisible by 0. Every integer is 
a multiple of —1, but no integer is divisible by —1 (strictly speaking). These 
definitions apply when m and n are any real numbers; for example, 2n is 
divisible by n. But we’ll almost always be using them when m and n are 
integers. After all, this is number theory. 


In other words, be 
prepared to drown. 


. . no integer is 
divisible by —1 
(strictly speaking).” 
— Graham, Knuth, 
and Patashnik [161] 
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In Britain we call 
this ‘hcf’ (highest 
common factor). 


Not to be confused 
with the greatest 
common multiple. 


(Remember that 
m/ or n' can be 
negative.) 


The greatest common divisor of two integers m and n is the largest 
integer that divides them both: 

gcd(m, n) = max{k | k\m and k\n}. ( 4 . 2 ) 

For example, gcd(12, 18) = 6 . This is a familiar notion, because it’s the 
common factor that fourth graders learn to take out of a fraction m/n when 
reducing it to lowest terms: 12/18 = (12/6) / (18/6) = 2/3. Notice that if 
n > 0 we have gcd( 0 ,n) = n, because any positive number divides 0 , and 
because n is the largest divisor of itself. The value of gcd(0,0) is undefined. 
Another familiar notion is the least common multiple , 

lcm(m, n) = min{k | k > 0 , m\k and n\k}; ( 4 . 3 ) 

this is undefined if m / 0 or n / 0. Students of arithmetic recognize this 
as the least common denominator, which is used when adding fractions with 
denominators m and n. For example, lcm(12, 18) = 36, and fourth graders 
know that p?+Tg=fg + ^ = ff- The 1cm is somewhat analogous to the 
gcd, but we don’t give it equal time because the gcd has nicer properties. 

One of the nicest properties of the gcd is that it is easy to compute, using 
a 2300-year-old method called Euclid’s algorithm. To calculate gcd(m, n), 
for given values 0 ^ m < n, Euclid’s algorithm uses the recurrence 

gcd( 0 ,n) = n; 

gcd(m,n) = gcdfn mod m, m) , for m > 0 . (4-4) 

Thus, for example, gcd(12,18) = gcd( 6 , 12) = gcd(0,6) = 6 . The stated 
recurrence is valid, because any common divisor of m and n must also be a 
common divisor of both m and the number n mod m, which is n— L n / m J rn - 
There doesn’t seem to be any recurrence for lcm(m, n) that’s anywhere near 
as simple as this. (See exercise 2.) 

Euclid’s algorithm also gives us more: We can extend it so that it will 
compute integers m' and n' satisfying 

m'm + n'n = gcd(m, n). ( 4 . 5 ) 

Here’s how. If m = 0, we simply take m' = 0 and n' = 1. Otherwise we 
let r = n mod m and apply the method recursively with r and m in place of 
m and n, computing r and ra such that 

fr + mm = gcd(r,m). 

Since r = n — L n / m J m an d gcd(r, m) = gcd(m, n), this equation tells us that 
f (n — [n/mj m) + m m = gcd(m, n) . 
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The left side can be rewritten to show its dependency on m and n: 

(iti — [n/mj r ) m + r n = gcd(m, n) ; 

hence m' = m — L'n./mjr and n/ = f are the integers we need in (4.5). For 
example, in our favorite case m = 1 2, n = 18 , this method gives 6 = 0-0+ 1 -6 = 
1 - 6 + 0-12 = (— 1 ) • 12 + 1 -18. 

But why is (4.5) such a neat result? The main reason is that there’s a 
sense in which the numbers m' and n' actually prove that Euclid’s algorithm 
has produced the correct answer in any particular case. Let’s suppose that 
our computer has told us after a lengthy calculation that gcd(m, u) = d and 
that m/m + n/n = d; but we’re skeptical and think that there’s really a 
greater common divisor, which the machine has somehow overlooked. This 
cannot be, however, because any common divisor of m and n has to divide 
m'm + n/n; so it has to divide d; so it has to be ^ d. Furthermore we can 
easily check that d does divide both m and n. (Algorithms that output their 
own proofs of correctness are called self- certifying.) 

We’ll be using (4.5) a lot in the rest of this chapter. One of its important 
consequences is the following mini-theorem: 

k\m and k\n k\gcd(m, n) . (4.6) 

(Proof: If k divides both m and n, it divides m'm + n'n, so it divides 
gcd(m, n). Conversely, if k divides gcd(m, n), it divides a divisor of m and a 
divisor of n, so it divides both m and n.) We always knew that any common 
divisor of m and n must be less than or equal to their gcd; that’s the 
definition of greatest common divisor. But now we know that any common 
divisor is, in fact, a divisor of their gcd. 

Sometimes we need to do sums over all divisors of u. In this case it’s 
often useful to use the handy rule 

y~ Q rn = a n/m . integer n > 0, (4.7) 

m\n m\n 

which holds since n/m runs through all divisors of n when m does. For 
example, when n = 12 this says that ai + a.2 + Q3 + <14 + + an = ai 2 + 

Cl6 + <14 + <13 + Q2 + d] . 

There’s also a slightly more general identity, 

^ Q m = X I Q ra [ix = mk] , (4.8) 

m\n k m>0 

which is an immediate consequence of the definition (4.1). If n is positive, the 
right-hand side of (4.8) is dn/ki hence (4.8) implies (4.7). And equation 
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How about the p in 
‘explicitly’? 


( 4 . 8 ) works also when n is negative. (In such cases, the nonzero terms on the 
right occur when k is the negative of a divisor of n.) 

Moreover, a double sum over divisors can be “interchanged” by the law 

L L CLk ,m — L L CLk.kl ■ (4-9) 

m\n k\m k\n l\(n/k) 

For example, this law takes the following form when n = 12: 

Q i ,1 + ( 04,2 + 02 , 2 ) + ( 01,3 + 03 , 3 ) 

+ (01 ,4 + Q2,4 + 04,4) + (01 ,6 + 02,6 + 03,6 + a6,6) 

+ (0142 + 02,12 + 03,12 + 04,12 + 06,12 + 012 , 12 ) 

= (Ol,l + ai ,2 + Oi ,3 + ai ,4 + Qi ,6 + Oi ,12) 

+ (02,2 + 02,4 + 02,6 + 02, 12) + (03,3 + 03,6 + 03,12) 

+ (04,4 + 04,12) + ( 06,6 + 06,12) + 012,12 • 

We can prove ( 4 . 9 ) with Iversonian manipulation. The left-hand side is 

Y y oic, m [n = jm] [m = kl] = Y_ Y_ o k) u[n = jkl] ; 

),l k,m >0 j k,l >0 

the right-hand side is 

y y Ok, kl [n = jk] [n/k = ml] = Y_ Y_ a k>kl [n = mlk] , 

j,m k,l >0 m k,l >0 

which is the same except for renaming the indices. This example indicates 
that the techniques we’ve learned in Chapter 2 will come in handy as we study 
number theory. 

4.2 PRIMES 

A positive integer p is called prime if it has just two divisors, namely 
1 and p. Throughout the rest of this chapter, the letter p will always stand 
for a prime number, even when we don’t say so explicitly. By convention, 
1 isn’t prime, so the sequence of primes starts out like this: 

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, ... . 

Some numbers look prime but aren’t, like 91 (= 7-13) and 161 (= 7-23). These 
numbers and others that have three or more divisors are called composite. 
Every integer greater than 1 is either prime or composite, but not both. 

Primes are of great importance, because they’re the fundamental building 
blocks of all the positive integers. Any positive integer n can be written as a 
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product of primes, 

m 

n = Pi...p m = ]^[pk, Pi ^ ••• Pm- (4* 10) 

k=1 

For example, 12 = 2-2-3; 11011 =7-111113; 11111 =41-271. (Products 
denoted by ]~[ are analogous to sums denoted by as explained in exer- 
cise 2.25. If m = 0, we consider this to be an empty product, whose value 
is 1 by definition; that’s the way n = 1 gets represented by ( 4 . 10 ).) Such a 
factorization is always possible because if n > 1 is not prime it has a divisor 
ni such that 1 < ni < n; thus we can write n = ni -n 2 , and (by induction) 
we know that n.j and n .2 can be written as products of primes. 

Moreover, the expansion in ( 4 . 10 ) is unique: There’s only one way to 
write n as a product of primes in nondecreasing order. This statement is 
called the Fundamental Theorem of Arithmetic, and it seems so obvious that 
we might wonder why it needs to be proved. How could there be two different 
sets of primes with the same product? Well, there can’t, but the reason isn’t 
simply “by definition of prime numbers.” For example, if we consider the set 
of all real numbers of the form m + rii/lO when m and n are integers, the 
product of any two such numbers is again of the same form, and we can call 
such a number “prime” if it can’t be factored in a nontrivial way. The number 
6 has two representations, 2 • 3 = (4 + \/T 0 ) (4 — \/T 0 ); yet exercise 36 shows 
that 2, 3, 4 + \/T 0 , and 4 — VTO are all “prime” in this system. 

Therefore we should prove rigorously that ( 4 . 10 ) is unique. There is 
certainly only one possibility when n = 1 , since the product must be empty 
in that case; so let’s suppose that n > 1 and that all smaller numbers factor 
uniquely. Suppose we have two factorizations 

n = Pi ...p m = qi ...q k , Pi^-"^Pm and qi^-^q k , 

where the p’s and q’s are all prime. We will prove that pi = qi . If not, we 
can assume that pi < qi, making pi smaller than all the q’s. Since pi and 
q 1 are prime, their gcd must be 1 ; hence Euclid’s self-certifying algorithm 
gives us integers a and b such that ap 1 + b q 1 = 1 . Therefore 

api q2 - - ■ qk + bqi q2 - • • qk = q2---qk- 

Now pi divides both terms on the left, since q 1 q 2 • • • qk = tl; hence pi divides 
the right-hand side, q 2 . - ■ qk- Thus q 2 ■ ■ • qk/Pi is an integer, and q 2 . . - qk 
has a prime factorization in which pi appears. But q 2 - - ■ qk < tx, so it has a 
unique factorization (by induction). This contradiction shows that pi must 
be equal to q 1 after all. Therefore we can divide both of n’s factorizations by 
pi , obtaining P 2 ■ • - Pm = P 2 - • ■ qk < n. The other factors must likewise be 
equal (by induction), so our proof of uniqueness is complete. 
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It’s the factor- 
ization, not the 
theorem, that’s 
unique. 


Sometimes it’s more useful to state the Fundamental Theorem in another 
way: Every positive integer can be written uniquely in the form 

n = ]^[p np , where each n p 0. (4.11) 

p 

The right-hand side is a product over infinitely many primes; but for any 
particular n all but a few exponents are zero, so the corresponding factors 
are 1. Therefore it’s really a finite product, just as many “infinite” sums are 
really finite because their terms are mostly zero. 

Formula (4.11) represents n uniquely, so we can think of the sequence 
(m , n.3 , u.5 , . . . ) as a number system for positive integers. For example, the 
prime-exponent representation of 12 is (2, 1 , 0 , 0 , ... ) and the prime-exponent 
representation of 18 is (1 , 2, 0, 0, . . . ). To multiply two numbers, we simply 
add their representations. In other words, 


k = mn 

<==4> 

k p = mp+Up for all p. 

(4.12) 

This implies that 




m\n 

<=» 

m p $5 Up for all p, 

(4-i3) 

and it follows immediately that 



k = gcd(m, n) 


k p = min(mp,n p ) for all p; 

(4-i4) 

k = lcm(m, n) 


k p = max(m p ,n p ) for all p. 

(4-i5) 

For example, since 12 

= 2 2 • 3 1 and 1 8 = 2 1 • 3 2 , we can get their gcd and 1cm 


by taking the min and max of common exponents: 

gcd(12, 18) = 2 min < 2 - 1 ’ -3 min,1 ' 2) = 2 1 -3 1 = 6; 

lcm(12, 18) = 2 max < 2 - 1 ) -3 max(1 ' 2) = 2 2 -3 2 = 36. 

If the prime p divides a product mn then it divides either m or n, perhaps 
both, because of the unique factorization theorem. But composite numbers 
do not have this property. For example, the nonprime 4 divides 60 = 6 • 1 0, 
but it divides neither 6 nor 10. The reason is simple: In the factorization 
60 = 6-10 = (2 -3) (2 -5), the two prime factors of 4 = 2-2 have been split 
into two parts, hence 4 divides neither part. But a prime is unsplittable, so 
it must divide one of the original factors. 


4.3 PRIME EXAMPLES 

How many primes are there? A lot. In fact, infinitely many. Euclid 
proved this long ago in his Theorem 9 : 20, as follows. Suppose there were only 
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finitely many primes, say k of them — 2, 3, 5, , P^. Then, said Euclid, we 

should consider the number 

M = 2 • 3 • 5 • . . . • P k + 1. 

None of the k primes can divide M, because each divides M — 1. Thus there 
must be some other prime that divides M; perhaps M itself is prime. This 
contradicts our assumption that 2, 3, . . . , Pt< are the only primes, so there 
must indeed be infinitely many. 

Euclid’s proof suggests that we define Euclid numbers by the recurrence 
e n = ei e2 . . . e n -i + 1 , when n 1. (4.16) 


“Ob ■KpUlTOb 
apbdpol nXeiovs 

> X X ~ 

ei(Ji 'KOLVTO ^ TOV 
nporedeuTOt; 
nXydovt; -Kpuiruv 
apbdpCjv." 

— Euclid [98] 
[Translation: 

“There are more 
primes than in 
any given set 
of primes.”] 


The sequence starts out 


ei =1 + 1 = 2 ; 

02 =2+1 = 3 ; 
e 3 = 2-3 + 1 = 7; 
e 4 = 2-3-7+ 1 = 43; 


these are all prime. But the next case, 05, is 1807 = 13-139. It turns out that 
eg = 3263443 is prime, while 

e 7 = 547-607-1033-31051 ; 

e 8 = 29881 -67003-9119521 -6212157481 . 


It is known that e< ?, . . . , 0] 7 are composite, and the remaining e n are probably 
composite as well. However, the Euclid numbers are all relatively prime to 
each other; that is, 

gcd(e m ,e n ) = 1, whenm/n. 

Euclid’s algorithm (what else?) tells us this in three short steps, because 
e n mod e m = 1 when n > m: 

gcd(e m ,e n ) = gcd(l,e m ) = gcd(0,l) = 1. 

Therefore, if we let be the smallest factor of ej for all j + 1 , the primes qi , 
q2, q3, ... are all different. This is a sequence of infinitely many primes. 

Let’s pause to consider the Euclid numbers from the standpoint of Chap- 
ter 1. Can we express e n in closed form? Recurrence (4.16) can be simplified 
by removing the three dots: If n > 1 we have 

0 n — 01 • ■ • 0n— 20n — 1 + 1 = (^n- 1 1 )0n — 1 1 ^n — 1 ^n — 1 T 1 • 
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Or probably more, 
by the time you 
read this. 


Thus e n has about twice as many decimal digits as e n _i . Exercise 37 proves 
that there’s a constant E « 1 .264 such that 

e n = IE 2 " + 1J • (4-17) 

And exercise 60 provides a similar formula that gives nothing but primes: 

Pn = |_P 3 "J , (4-18) 

for some constant P. But equations like (4.17) and (4.18) cannot really be 
considered to be in closed form, because the constants E and P are computed 
from the numbers e n and p n in a sort of sneaky way. No independent re- 
lation is known (or likely) that would connect them with other constants of 
mathematical interest. 

Indeed, nobody knows any useful formula that gives arbitrarily large 
primes but only primes. Computer scientists at Chevron Geosciences did, 
however, strike mathematical oil in 1984. Using a program developed by 
David Slowinski, they discovered the largest prime known at that time, 

2216091 _ 1 

while testing a new Cray X-MP supercomputer. It’s easy to compute this 
number in a few milliseconds on a personal computer, because modern com- 
puters work in binary notation and this number is simply (11 ... 1)2. All 
216,091 of its bits are ‘1’. But it’s much harder to prove that this number 
is prime. In fact, just about any computation with it takes a lot of time, 
because it’s so large. For example, even a sophisticated algorithm requires 
several minutes just to convert 2 21 6091 — 1 to radix 1 0 on a PC. When printed 
out, its 65,050 decimal digits require 75 cents U.S. postage to mail first class. 

Incidentally, 2 216091 — 1 is the number of moves necessary to solve the 
Tower of Hanoi problem when there are 216,091 disks. Numbers of the form 

2 p -l 

(where p is prime, as always in this chapter) are called Mersenne numbers, 
after Father Marin Mersenne who investigated some of their properties in 
the seventeenth century [269]. . The Mersenne primes known to date occur 
for p = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 
2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 
110503, 132049, 216091, and 756839. 

The number 2 n — 1 can’t possibly be prime if n is composite, because 
2 km — 1 has 2 m — 1 as a factor: 

2 km _ i = (2 m — 1 ) (2 rr 4 k ~ 1 1 + 2 m ( k ~ 2 ) 1 ] 
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But 2 P — 1 isn’t always prime when p is prime; 2 1 1 — 1 = 2047 = 23-89 is the 
smallest such nonprime. (Mersenne knew this.) 

Factoring and primality testing of large numbers are hot topics nowadays. 
A summary of what was known up to 1981 appears in Section 4.5.4 of [208], 
and many new results continue to be discovered. Pages 391-394 of that book 
explain a special way to test Mersenne numbers for primality. 

For most of the last two hundred years, the largest known prime has 
been a Mersenne prime, although only 31 Mersenne primes are known. Many 
people are trying to find larger ones, but it’s getting tough. So those really 
interested in fame (if not fortune) and a spot in The Guinness Book of World 
Records might instead try numbers of the form 2 n k+ 1, for small values of k 
like 3 or 5. These numbers can be tested for primality almost as quickly as 
Mersenne numbers can; exercise 4.5.4-27 of [208] gives the details. 

We haven’t fully answered our original question about how many primes 
there are. There are infinitely many, but some infinite sets are “denser” than 
others. For instance, among the positive integers there are infinitely many 
even numbers and infinitely many perfect squares, yet in several important 
senses there are more even numbers than perfect squares. One such sense 
looks at the size of the nth value. The nth even integer is 2n and the nth 
perfect square is n 2 ; since 2n is much less than n 2 for large n, the nth even 
integer occurs much sooner than the nth perfect square, so we can say there 
are many more even integers than perfect squares. A similar sense looks at 
the number of values not exceeding x. There are [x/2J such even integers and 
[VxJ perfect squares; since x/2 is much larger than yjx for large x, again we 
can say there are many more even integers. 

What can we say about the primes in these two senses? It turns out that 
the nth prime, P n , is about n times the natural log of n: 


Weird. I thought 
there were the same 
number of even 
integers as per- 
fect squares, since 
there’s a one-to-one 
correspondence 
between them. 


P n ~ n In n . 


(The symbol can be read “is asymptotic to”; it means that the limit of 
the ratio P n /nlnn is 1 as n goes to infinity.) Similarly, for the number of 
primes 7t(x) not exceeding x we have what’s known as the prime number 
theorem: 


7t(x) ~ 


X 

lnx 


Proving these two facts is beyond the scope of this book, although we can 
show easily that each of them implies the other. In Chapter 9 we will discuss 
the rates at which functions approach infinity, and we’ll see that the func- 
tion nlnn, our approximation to P n , lies between 2n and n 2 asymptotically. 
Hence there are fewer primes than even integers, but there are more primes 
than perfect squares. 
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These formulas, which hold only in the limit as n or x 4 oo, can be 
replaced by more exact estimates. For example, Rosser and Schoenfeld [312] 
have established the handy bounds 

lnx — | < In x — j , for x 67; (4.19) 

n(lnn + lnlnn — |) < P n < n(lnn + lnlnn — j) , for n ^ 20. (4.20) 


If we look at a “random” integer n, the chances of its being prime are 
about one in Inn. For example, if we look at numbers near 10 16 , we’ll have to 
examine about 16 In 10 « 36.8 of them before finding a prime. (It turns out 
that there are exactly 10 primes between 10 16 — 370 and 10 16 — 1.) Yet the 
distribution of primes has many irregularities. For example, all the numbers 
between Pi P2 . . . P n + 2 and Pi P2 • ■ ■ Pn + P n +i ~ 1 inclusive are composite. 
Many examples of “twin primes” p and p + 2 are known (5 and 7, 1 1 and 13, 
17 and 19, 29 and 31, ... , 9999999999999641 and 9999999999999643, . . . ), yet 
nobody knows whether or not there are infinitely many pairs of twin primes. 
(See Hardy and Wright [181, §1.4 and §2.8].) 

One simple way to calculate all 7t(x) primes ^ x is to form the so-called 
sieve of Eratosthenes: First write down all integers from 2 through x. Next 
circle 2, marking it prime, and cross out all other multiples of 2. Then repeat- 
edly circle the smallest uncircled, uncrossed number and cross out its other 
multiples. When everything has been circled or crossed out, the circled num- 
bers are the primes. For example when x = 10 we write down 2 through 1 0, 
circle 2, then cross out its multiples 4, 6, 8, and 10. Next 3 is the smallest 
uncircled, uncrossed number, so we circle it and cross out 6 and 9. Now 
5 is smallest, so we circle it and cross out 10. Finally we circle 7. The circled 
numbers are 2, 3, 5, and 7; so these are the 7t( 1 0) =4 primes not exceeding 1 0. 


“Je me sers de la 
notation tres simple 
n! pour designer le 
produit de nombres 
decroissans depuis 
n jusqu’a V unite, 
savoir n(n — 1 ) 
(n- 2).... 3.2.1. 
L’empioi continuel 
de l’anaiyse combi- 
natoire que je fais 
dans la piupart de 
mes demonstrations, 
a rendu cette nota- 
tion indispensable.” 

— Ch. Kramp [228] 


4.4 FACTORIAL FACTORS 

Now let’s take a look at the factorization of some interesting highly 
composite numbers, the factorials: 

n 

n! = l-2 -...-n = ]^[k, integer n ^ 0. (4.21) 

k=1 

According to our convention for an empty product, this defines 0! to be 1. 
Thus n! = (n — 1)!n for every positive integer n. This is the number of 
permutations of n distinct objects. That is, it’s the number of ways to arrange 
n things in a row: There are n choices for the first thing; for each choice of 
first thing, there are n — 1 choices for the second; for each of these n(n — 1 ) 
choices, there are n — 2 for the third; and so on, giving n(n — 1 ) (n — 2 )...( 1 ) 
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arrangements in all. Here are the first few values of the factorial function. 


n 

0 1 2 

3 

4 

5 

6 

7 

8 

9 

10 

n! 

1 1 2 

6 

24 

120 

720 

5040 

40320 

362880 

3628800 


It’s useful to know a few factorial facts, like the first six or so values, and the 
fact that 10! is about 3j million plus change; another interesting fact is that 
the number of digits in n! exceeds n when n 25. 

We can prove that n! is plenty big by using something like Gauss’s trick 
of Chapter 1: 

n 

n! 2 = (1 •2 -...-n)(n-...- 2 -l) = ]^[ lc(n + 1 - k) . 

k=l 


We have n ^ k(n + 1 — k) ^(n + l) 2 , since the quadratic polynomial 
k(n +1 — k) = l(n + 1 ) 2 — (k — 1 (n. + 1 )) 2 has its smallest value at k = 1 
and its largest value at k = j (n + 1 ). Therefore 


n ^ n! 2 

k=l 


n 




(n + 1 ) 2 

4 


that is, 


n n/2 ^ n! ^ 


(n+ l) n 

2 n 


(4.22) 


This relation tells us that the factorial function grows exponentially!! 

To approximate n! more accurately for large n we can use Stirling’s 
formula, which we will derive in Chapter 9: 


n! ~ 



(4-23) 


And a still more precise approximation tells us the asymptotic relative error: 
Stirling’s formula undershoots n! by a factor of about 1 /(12n). Even for fairly 
small n this more precise estimate is pretty good. For example, Stirling’s 
approximation (4.23) gives a value near 3598696 when n = 10, and this is 
about 0.83% ss 1/120 too small. Good stuff, asymptotics. 

But let’s get back to primes. We’d like to determine, for any given 
prime p, the largest power of p that divides n!; that is, we want the exponent 
of p in n!’s unique factorization. We denote this number by e p (n!), and we 
start our investigations with the small case p = 2 and n = 1 0. Since 1 0! is the 
product of ten numbers, 62(10!) can be found by summing the powers-of-2 
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contributions of those ten numbers; this calculation corresponds to summing 
the columns of the following array: 



123456789 10 

powers of 2 

divisible by 2 

X X X X X 

5= L10/2J 

divisible by 4 

X X 

2 = L10/4J 

divisible by 8 

X 

1 = L10/8J 

powers of 2 

010201030 1 

8 


A powerful ruler. (The column sums form what’s sometimes called the ruler function p(k), 
because of their similarity to ‘ I ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' l ' 1 ' 1 ' 1 ' i ' 1 ' 1 ' 1 ' | ’, the lengths of lines marking 
fractions of an inch.) The sum of these ten sums is 8; hence 2 s divides 10! 
but 2 9 doesn’t. 

There’s also another way: We can sum the contributions of the rows. 
The first row marks the numbers that contribute a power of 2 (and thus are 
divisible by 2); there are [1 0/2J = 5 of them. The second row marks those 
that contribute an additional power of 2; there are [ 1 0/4J = 2 of them. And 
the third row marks those that contribute yet another; there are [1 0/8J = 1 of 
them. These account for all contributions, so we have £2(10!) = 5 + 2+ 1 =8. 
For general n this method gives 

k^1 

This sum is actually finite, since the summand is zero when 2 k > n. Therefore 
it has only [lg tlJ nonzero terms, and it’s computationally quite easy. For 
instance, when n = 1 00 we have 

e 2 ( 1 00! ) = 50 + 25+ 12 + 6 + 3 + 1 = 97. 

Each term is just the floor of half the previous term. This is true for all n, 
because as a special case of (3.11) we have |_rt/2 1c+ 1 J = [Ln/2 k J/2j . It’s espe- 
cially easy to see what’s going on here when we write the numbers in binary: 


100 = 

(1100100)2 = 

100 

L100/2J = 

(110010)2 = 

50 

L100/4J = 

(11001)2 = 

25 

L100/8J = 

(1100)2 = 

12 

LI 00/1 6J = 

(110)2 = 

6 

[100/32J = 

(11)2 = 

3 

[100/64J = 

(1)2 = 

1 


We merely drop the least significant bit from one term to get the next. 
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The binary representation also shows us how to derive another formula, 


e 2 (n!) = n-v 2 (n) , 


(4.24) 


where v 2 (n) is the number of 1’s in the binary representation of n. This 
simplification works because each 1 that contributes 2 m to the value of n 
contributes 2 m - 1 + 2 m ~ 2 + • • • + 2° = 2 m - 1 to the value of e 2 (n!). 
Generalizing our findings to an arbitrary prime p, we have 


e p (n!) 




(4-25) 


by the same reasoning as before. 

About how large is e p (n!)? We get an easy (but good) upper bound by 
simply removing the floor from the summand and then summing an infinite 
geometric progression: 


e p (n!) < 


TL 

P 



n 

P 


1 +- + 


n 

P 



n 

p^T' 


For p = 2 and n = 100 this inequality says that 97 < 100. Thus the up- 
per bound 100 is not only correct, it’s also close to the true value 97. In 
fact, the true value n — v 2 (n) is ~ n in general, because ~v 2 (n) ^ [lgn] is 
asymptotically much smaller than n. 

When p = 2 and 3 our formulas give e 2 (n!) ~ n and £3(11!) ~ n/2, so 
it seems reasonable that every once in awhile £3(n!) should be exactly half 
as big as e 2 (n!). For example, this happens when n = 6 and n = 7, because 
6! = 2 4 • 3 2 • 5 = 7!/7. But nobody has yet proved that such coincidences 
happen infinitely often. 

The bound on e p (n!) in turn gives us a bound on p £p ^ n!) , which is p’s 
contribution to n! : 


p£ p (n!) < n/(p-1) 

And we can simplify this formula (at the risk of greatly loosening the upper 
bound) by noting that p <C 2 ^ ; hence V n /lv-i) <c (2 p-i )n/( P -i ) = 2 n i n 
other words, the contribution that any prime makes to n! is less than 2 n . 
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We can use this observation to get another proof that there are infinitely 
many primes. For if there were only the k primes 2, 3, . . . , P^, then we’d 
have n! < (2 n ) k = 2 nk for all n > 1 , since each prime can contribute at most 
a factor of 2 n — 1 . But we can easily contradict the inequality n! < 2 nk by 
choosing n large enough, say n = 2 2k . Then 

n! < 2 nk = 2 22kk = u n/2 , 

contradicting the inequality n! n n ^ 2 that we derived in (4.22). There are 
infinitely many primes, still. 

We can even beef up this argument to get a crude bound on 7t(n), the 
number of primes not exceeding n. Every such prime contributes a factor of 
less than 2 n to n!; so, as before, 

n! < 2 nn{n] . 

If we replace n! here by Stirling’s approximation (4.23), which is a lower 
bound, and take logarithms, we get 

n.7t(n) > nlg(n/e) + \ lg(27tn) ; 

hence 

7 t(n) > lg(n/e) . 

This lower bound is quite weak, compared with the actual value 7t(n) ~ 
n/lnn, because logri is much smaller than n/logn when n is large. But we 
didn’t have to work very hard to get it, and a bound is a bound. 


4.5 RELATIVE PRIMALITY 


Like perpendicular 
lines don ’t have 
a common direc- 
tion, perpendicular 
numbers don’t have 
common factors. 


When gcd(m, n) = 1 , the integers m and n have no prime factors in 
common and we say that they’re relatively prime. 

This concept is so important in practice, we ought to have a special 
notation for it; but alas, number theorists haven’t agreed on a very good one 
yet. Therefore we cry: Hear us, O Mathematicians op the World! Let 
US NOT WAIT ANY LONGER! WE CAN MAKE MANY FORMULAS CLEARER BY 
ADOPTING A NEW NOTATION NOW! LET US AGREE TO WRITE 'mitl', AND 
TO SAY “rn IS PRIME TO TL,” IP m AND n ARE RELATIVELY PRIME. In other 
words, let us declare that 

m _L n m, n are integers and gcd(m, n) = 1. (4.26) 
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A fraction m/n is in lowest terms if and only if m 1 n. Since we 
reduce fractions to lowest terms by casting out the largest common factor of 
numerator and denominator, we suspect that, in general, 

m/gcd(m,n) ± n/gcd(m,n); (4.27) 

and indeed this is true. It follows from a more general law, gcd(km, kn) = 
kgcd(m, n), proved in exercise 14. 

The _L relation has a simple formulation when we work with the prime- 
exponent representations of numbers, because of the gcd rule (4.14): 

m J_ n min(m p ,n p ) = 0 for all p. (4-28) 

Furthermore, since m p and n p are nonnegative, we can rewrite this as The dot product is 

zero, like orthogonal 

m _L n «=» m p n p = 0 for all p. (4.29) vedors - 

And now we can prove an important law by which we can split and combine 
two _L relations with the same left-hand side: 

k _L m and k J_ n k _L mn. (4.30) 

In view of (4.29), this law is another way of saying that k p m p = 0 and 
k p n p =0 if and only if k p (m p + n p ) =0, when m p and n p are nonnegative. 

There’s a beautiful way to construct the set of all nonnegative fractions 
m/n with m J_ n, called the Stern-Brocot tree because it was discovered Interesting how 
independently by Moriz Stern [339], a German mathematician, and Achille mathematicians 
Brocot [40] , a French clockmaker. The idea is to start with the two fractions ^red^whenabso 

(y, (j) and then to repeat the following operation as many times as desired: lutely anyone else 

would have said 

m+m' r . m ,m' “invented.” 

Insert between two adjacent fractions — and — . 

n + n' n n' 

The new fraction (m-|-m/)/(n-|-n') is called the mediant of m/n and m/ /n' . 

For example, the first step gives us one new entry between y and 1, 

0 1 1 . 

1 ’ 1 > 0 ’ 

and the next gives two more: 

01121 
1 ’ 2 ’ 1 ’ 1 > 0 • 

The next gives four more, 

011213231. 

1> 3> 2’ 3> 1’ 2> 1’ 1> 0’ 
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I guess 1/0 is 
infinity, “in lowest 
terms.” 


Conserve parody. 


and then we’ll get 8, 16, and so on. The entire array can be regarded as an 
infinite binary tree structure whose top levels look like this: 


0 1 




l\ 


3 3 
8 7 


l\ 


4 5 
7 8 


l\ 


5 4 
7 5 



4 5 


l\ 


8 7 
5 4 


/\ 


7 8 
3 3 


l\ 


7 5 

2 1 


Each fraction is m+m , , where — is the nearest ancestor above and to the left, 
and "V/is the nearest ancestor above and to the right. (An “ancestor” is a 
fraction that’s reachable by following the branches upward.) Many patterns 
can be observed in this tree. 

Why does this construction work? Why, for example, does each mediant 
fraction (m+ m/)/(n + n/) turn out to be in lowest terms when it appears in 
this tree? (If m, m/, n, and n' were all odd, we’d get even/even; somehow the 
construction guarantees that fractions with odd numerators and denominators 
never appear next to each other.) And why do all possible fractions m/n occur 
exactly once? Why can’t a particular fraction occur twice, or not at all? 

All of these questions have amazingly simple answers, based on the fol- 
lowing fundamental fact: If m/n and m'/n' are consecutive fractions at any 
stage of the construction, we have 

m/n — mu' = 1 . (4.31) 

This relation is true initially (1-1 — 0-0 = 1); and when we insert a new 
mediant (m+ m/)/(n + n'), the new cases that need to be checked are 

(m + m')n — m(n + n') = 1; 

m'(n + n') — (m + m/)n' = 1. 


Both of these equations are equivalent to the original condition (4.31) that 
they replace. Therefore (4.31) is invariant at all stages of the construction. 

Furthermore, if m/n < m'/n' and if all values are nonnegative, it’s easy 
to verify that 

m/n < (m + m')/(n + n') < m'/n' . 
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A mediant fraction isn’t halfway between its progenitors, but it does lie some- 
where in between. Therefore the construction preserves order, and we couldn’t 
possibly get the same fraction in two different places. 

One question still remains. Can any positive fraction a/b with a 1 b 
possibly be omitted? The answer is no, because we can confine the construc- 
tion to the immediate neighborhood of a/b, and in this region the behavior 
is easy to analyze: Initially we have 


True, but if you get 
a compound frac- 
ture you’d better go 
see a doctor. 


m 0 ^ /aN , 1_ m' 

n ~ 1 ^ U/ ^ 0 " n' > 


where we put parentheses around % to indicate that it’s not really present 
yet. Then if at some stage we have 


the construction forms (m + m')/(n + n') and there are three cases. Either 
(m + m/)/(n + n') = a/b and we win; or (m + + n/) < a/b and we 

can set m f— m + m/, n <— n + n'; or (m + m')/(n + n/) > a/b and we 
can set m' <— m + m', n/ <— n + n'. This process cannot go on indefinitely, 
because the conditions 

§-T >0 and £ - § > 0 

imply that 

an — bm 1 and bm' - an' 1 ; 


hence 


(m' + n.')(an - bm) + (m + n)(bm' - an') :> m/ + n' + m + n; 

and this is the same as a + b m' +n' + m + nby ( 4 . 31 ). Either m or n or 
m' or u' increases at each step, so we must win after at most a + b steps. 

The Farey series of order N, denoted by T N , is the set of all reduced 
fractions between 0 and 1 whose denominators are N or less, arranged in 
increasing order. For example, if N = 6 we have 

n- 01 1 1 1 21 323451 

J 6 1’6 , 5 , 4 , 3’5’2 , 5’3 , 4 , 5 , 6 , 1* 

We can obtain 3 /m in general by starting with 3/ = y, y and then inserting 
mediants whenever it’s possible to do so without getting a denominator that 
is too large. We don’t miss any fractions in this way, because we know that 
the Stern-Brocot construction doesn’t miss any, and because a mediant with 
denominator N is never formed from a fraction whose denominator is > N . 
(In other words, 3 /m defines a subtree of the Stern-Brocot tree, obtained by 
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Farey ’nough. 


pruning off unwanted branches.) It follows that m'n — tun/ = 1 whenever 
m/n and m'/n' are consecutive elements of a Farey series. 

This method of construction reveals that 3/g can be obtained in a simple 
way from 3 n_e We simply insert the fraction (m + m/)/N between con- 
secutive fractions m/n, m/ /n' of Tn i whose denominators sum to N. For 
example, it’s easy to obtain 3V from the elements of 3/, by inserting A, I, 
■ . ■ , y according to the stated rule: 

q- 0 1 1 1 121231432534561 

^ 7 1 , 7 , 6 , 5 , 4’7 , 3 , 5 , 7 , 2’7 , 5 , 3 , 7’4’5 , 6’7 , 1 * 

When N is prime, N — 1 new fractions will appear; but otherwise we’ll have 
fewer than N — 1 , because this process generates only numerators that are 
relatively prime to N. 

Long ago in (4.5) we proved — in different words — that whenever m _L n 
and 0 <m/nwe can find integers a and b such that 


iuq — nb = 1 . (4-3 2 ) 

(Actually we said m'm + n'n = gcd(m,n), but we can write 1 for gcd(m,n), 
a for m', and b for — n'.) The Farey series gives us another proof of (4.32), 
because we can let b/a be the fraction that precedes m/n in 3~ n . Thus (4.5) 
is just (4.31) again. For example, one solution to 3a — 7b = 1 is a = 5, b = 2, 
since | precedes I in 3V- This construction implies that we can always find a 
solution to (4.32) with 0^b<a<n, ifO<m^n. Similarly, if 0 ^ n < m 
and m 1 n, we can solve (4.32) with 0 <a^b$jmby letting a/b be the 
fraction that follows n/m in 3 r m . 

Sequences of three consecutive terms in a Farey series have an amazing 
property that is proved in exercise 61. But we had better not discuss the 
Farey series any further, because the entire Stern-Brocot tree turns out to be 
even more interesting. 

We can, in fact, regard the Stern-Brocot tree as a number system for 
representing rational numbers, because each positive, reduced fraction occurs 
exactly once. Let’s use the letters L and R to stand for going down to the 
left or right branch as we proceed from the root of the tree to a particular 
fraction; then a string of L’s and R’s uniquely identifies a place in the tree. 
For example, LRRL means that we go left from | down to ) , then right to |, 
then right to |, then left to j. We can consider LRRL to be a representation 
of S. Every positive fraction gets represented in this way as a unique string 
of L’s and R’s. 

Well, actually there’s a slight problem: The fraction j corresponds to 
the empty string, and we need a notation for that. Let’s agree to call it I, 
because that looks something like 1 and it stands for “identity.” 
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This representation raises two natural questions: (1) Given positive inte- 
gers m and n with m _L n, what is the string of L’s and R’s that corresponds 
to m/n? (2) Given a string of L’s and R’s, what fraction corresponds to it? 
Question 2 seems easier, so let’s work on it first. We define 


f(S) = fraction corresponding to S 

when S is a string of L’s and R’s. For example, f(LRRL) = 5 . 

According to the construction, f (S ) = (m + m')/(n + n') if m/n and 
m' /n' are the closest fractions preceding and following S in the upper levels 
of the tree. Initially m/n = 0/1 and m'/n' = 1/0; then we successively 
replace either m/n or m'/n' by the mediant (m + m/)/(n + n') as we move 
right or left in the tree, respectively. 

How can we capture this behavior in mathematical formulas that are 
easy to deal with? A bit of experimentation suggests that the best way is to 
maintain a 2 x 2 matrix 


M(S) = 


n n 
m m' 


that holds the four quantities involved in the ancestral fractions m/n and 
m'/n' enclosing f(S). We could put the m’s on top and the n’s on the bot- 
tom, fractionwise; but this upside-down arrangement works out more nicely 
because we have M(I) = (J 1 ^) when the process starts, and (J°) is tradition- 
ally called the identity matrix I. 

A step to the left replaces n' by n + n' and m' by m + m'; hence 

M|SI -> = (Z mw) = (m m') (o !)= MIS >Q !)■ 

(This is a special case of the general rule 

/a b \ / w x\ _ f aw + by ax + bz \ 

Y c d y ^ y z y ycw + dy cx+dzy 


for multiplying 2x2 matrices.) Similarly it turns out that 

M(SR > = (mt-m' m') = M « SI ( ! ?)’ 

Therefore if we define L and R as 2 x 2 matrices, 


If you’re clueless 
about matrices, 
don’t panic; this 
book uses them 
only here. 


L = 




(4-33) 
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we get the simple formula M(S) = S, by induction on the length of S. Isn’t 
that nice? (The letters L and R serve dual roles, as matrices and as letters in 
the string representation.) For example, 


M(LRRL) = LRRL = (J {)(]?)({ %}) = Q(") = (“) ; 

the ancestral fractions that enclose LRRL = | are | and |. And this con- 
struction gives us the answer to Question 2: 

f(S) = f( 


n n 
m m/ 


m - 
n - 


■ m 


■ n' 


(4-34) 


How about Question 1? That’s easy, now that we understand the fun- 
damental connection between tree nodes and 2x2 matrices. Given a pair of 
positive integers m and n, with m 1 n, we can find the position of m/n in 
the Stern-Brocot tree by “binary search” as follows: 


S := I; 

while m/n^ffS) do 

if m/n < f(S) then (output(L); S := SL) 
else (output (R); S := SR) . 

This outputs the desired string of L’s and R’s. 

There’s also another way to do the same job, by changing m and n instead 
of maintaining the state S. If S is any 2x2 matrix, we have 


f (RS) = f(S) + 1 


because RS is like S but with the top row added to the bottom row. (Let’s 
look at it in slow motion: 


S = 




RS 


n n' \ 

m + n m ' +n' J ’ 


hence f(S) = (m+m , )/(TX+n / ) and f(RS) = (^ + ^ + ( 1 ^+^))/^ + ^).) 
If we carry out the binary search algorithm on a fraction m/n with m > n, 
the first output will be R; hence the subsequent behavior of the algorithm will 
have f ( S ) exactly 1 greater than if we had begun with (m — n)/n instead of 
m/n. A similar property holds for L, and we have 


m 

n 

m 

n 


f (RS) 
f(LS) 


iff 

M— 

II 

£ 

£ 

when m > n; 

= f(S), 

n — m 

when m < n. 
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This means that we can transform the binary search algorithm to the following 
matrix-free procedure: 

while rriy^n do 

if m < n then (output(L); n := n — m) 
else (output(R); m := m — n) . 

For example, given m/n = 5/7, we have successively 

m= 55311 
u= 7 2 2 2 1 

output L R R L 

in the simplified algorithm. 

Irrational numbers don’t appear in the Stern-Brocot tree, but all the 
rational numbers that are “close” to them do. For example, if we try the 
binary search algorithm with the number e = 2.71828 . . . , instead of with a 
fraction m/n, we’ll get an infinite string of L’s and R’s that begins 

RRLRRLRLLLLRLRRRRRRLRLLLLLLLLRLR .... 

We can consider this infinite string to be the representation of e in the Stern- 
Brocot number system, just as we can represent e as an infinite decimal 
2.718281828459... or as an infinite binary fraction (10.101101 1 1 11 10 ... ) 2 - 
Incidentally, it turns out that e’s representation has a regular pattern in the 
Stern-Brocot system: 

e = RL°RLR 2 LRL 4 RLR 6 LRL 8 RLR 10 LRL 12 RL ... ; 

this is equivalent to a special case of something that Euler [105] discovered 
when he was 24 years old. 

From this representation we can deduce that the fractions 

RRLRRLRLLLL R L R R R R 

12358211930496887 1_93 299 492 685 878 

1 ’ 1 ’ 1 ’ 2 ’ 3 ’ 4 ’ 7 > 11 ’ 18 > 25 > 32 > 39 > 71 ’ 1 10 > 181 ’ 252 ’ 323 > 4 4 4 

are the simplest rational upper and lower approximations to e. For if m/n 
does not appear in this list, then some fraction in this list whose numerator 
is ^ m and whose denominator is ^ n lies between m/n and e. For example, 
yg is not as simple an approximation as y- = 2.714..., which appears in 
the list and is closer to e. We can see this because the Stern-Brocot tree 
not only includes all rationals, it includes them in order, and because all 
fractions with small numerator and denominator appear above all less simple 
ones. Thus, = RRLRRLL is less than y = RRLRRL, which is less than 
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e = RRLRRLR .... Excellent approximations can be found in this way. For 
example, ~ 2.7 1 8280 agrees with e to six decimal places; we obtained this 
fraction from the first 19 letters of e’s Stern-Brocot representation, and the 
accuracy is about what we would get with 19 bits of e’s binary representation. 

We can find the infinite representation of an irrational number a by a 
simple modification of the matrix-free binary search procedure: 

if a < 1 then (output(L); a := a/(1 — a)) 
else (output(R); a := a— l). 

(These steps are to be repeated infinitely many times, or until we get tired.) 

If oc is rational, the infinite representation obtained in this way is the same as 
before but with RL°° appended at the right of cc’s (finite) representation. For 
example, if oc = 1 , we get RLLL . . . , corresponding to the infinite sequence of 
fractions |, f, |, |, |, ..., which approach 1 in the limit. This situation is 
exactly analogous to ordinary binary notation, if we think of L as 0 and R as 1 : 

Just as every real number x in [0 . . 1 ) has an infinite binary representation 
(.bi bib 3 . . . )2 not ending with all 1 ’s, every real number cx in [ 0 ..oo) has 
an infinite Stern-Brocot representation B 1 B 2 B 3 ... not ending with all R’s. 
Thus we have a one-to-one order-preserving correspondence between [0 . . 1 ) 
and [0 . . 00 ) if we let 0 <-> L and 1 R. 

There’s an intimate relationship between Euclid’s algorithm and the 
Stern-Brocot representations of rationals. Given a = m/n, we get [m./ txJ 
R’s, then )n/(mmod n)J L’s, then [(m mod n)/(n mod (m mod n)) J R’s, 
and so on. These numbers m mod n, n mod (m mod n), ... are just the val- 
ues examined in Euclid’s algorithm. (A little fudging is needed at the end 
to make sure that there aren’t infinitely many R’s.) We will explore this 
relationship further in Chapter 6 . 

4.6 ‘MOD’: THE CONGRUENCE RELATION 

Modular arithmetic is one of the main tools provided by number 
theory. We got a glimpse of it in Chapter 3 when we used the binary operation 
‘mod’, usually as one operation amidst others in an expression. In this chapter 
we will use ‘mod’ also with entire equations, for which a slightly different 
notation is more convenient: 

a = b (mod m) 4=4 a mod m = bmodm. (4-35) 

For example, 9 = — 1 6 (mod 5), because 9 mod 5=4 = (—16) mod 5. The 
formula ‘a = b (mod m)’ can be read “a is congruent to b modulo m.” The 
definition makes sense when a, b, and m are arbitrary real numbers, but we 
almost always use it with integers only. 
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Since x mod m differs from x by a multiple of m, we can understand 
congruences in another way: 

a = b (mod m) 4=4 a — b is a multiple of m. (4-36) 

For if a mod m = b mod m, then the definition of ‘mod’ in (3.21) tells us 
that a — b = a mod m + km — (b mod m + lm) = (k — l)m for some integers 
k and l. Conversely if a — b = km, then a = b if m = 0 ; otherwise 

a mod m = a— [ a. / rrv J nx = b + km — [(b + km)/mj m 

= b— [b/irijm = bmodm. 

The characterization of = in (4.36) is often easier to apply than (4.35). For 
example, we have 8 = 23 (mod 5 ) because 8 — 23 = —1 5 is a multiple of 5 ; we 
don’t have to compute both 8 mod 5 and 23 mod 5 . 

The congruence sign ‘ = ’ looks conveniently like ‘ = ’, because congru- 
ences are almost like equations. For example, congruence is an equivalence 
relation] that is, it satisfies the reflexive law ‘a = a’, the symmetric law 
‘a = b =4 b = a’, and the transitive law ‘a = b = c =4 a = c’. 
All these properties are easy to prove, because any relation '=’ that satisfies 
‘a = b 4=4 f(a) = f(b)’ for some function f is an equivalence relation. (In 
our case, f(x) = x mod m.) Moreover, we can add and subtract congruent 
elements without losing congruence: 

a = b and c = d =4 a + c = b + d (mod m) ; 

a = b and c = d =4 a — c = b — d (mod m) . 

For if a — b and c — d are both multiples of m, so are (a + c) — (b + d) = 
(a — b) + (c — d) and (a — c) — (b — d) = (a — b) — (c — d). Incidentally, it 
isn’t necessary to write ‘(mod m)’ once for every appearance of ‘ = ’; if the 
modulus is constant, we need to name it only once in order to establish the 
context. This is one of the great conveniences of congruence notation. 
Multiplication works too, provided that we are dealing with integers: 

a = b and c = d =4 QC = bd (mod m) , 

integers b, c. 

Proof: ac — bd = (a — b)c + b(c — d). Repeated application of this multipli- 
cation property now allows us to take powers: 

a = b =4 a n = b n (mod m) , integers a,b; 

integer 0. 


“I feel line today 
modulo a slight 
headache.’’ 

— The Hacker’s 
Dictionary [337] 
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For example, since 2 = — 1 (mod 3), we have 2 n = (—1 ) n (mod 3); this means 
that 2 n — 1 is a multiple of 3 if and only if n is even. 

Thus, most of the algebraic operations that we customarily do with equa- 
tions can also be done with congruences. Most, but not all. The operation 
of division is conspicuously absent. If ad = bd (mod m), we can’t always 
conclude that a = b. For example, 3-2 = 5-2 (mod 4), but 3^5. 

We can salvage the cancellation property for congruences, however, in 
the common case that d and m are relatively prime: 

ad = bd <^=4> a = b (mod m) , (4-37) 

integers a, b, d, m and dim. 

For example, it’s legit to conclude from 15 = 35 (mod m) that 3 = 7 (mod m), 
unless the modulus m is a multiple of 5. 

To prove this property, we use the extended gcd law (4.5) again, finding 
d' and m' such that d'd + m'm = 1. Then if ad = bd we can multiply 
both sides of the congruence by d', obtaining ad'd = bd'd. Since d'd = 1, 
we have ad'd = a and bd'd = b; hence a = b. This proof shows that the 
number d' acts almost like 1/d when congruences are considered (mod m); 
therefore we call it the “inverse of d modulo m.” 

Another way to apply division to congruences is to divide the modulus 
as well as the other numbers: 

ad = bd (mod md) a = b (mod m) , for d ^ 0. (4.38) 

This law holds for all real a, b, d, and m, because it depends only on the 
distributive law (a mod m)d = ad mod md: We have a mod m = b mod m 
(a mod m)d = (b mod m)d ad mod md = bd mod md. Thus, 

for example, from 3-2 = 5-2 (mod 4) we conclude that 3 = 5 (mod 2). 

We can combine (4.37) and (4.38) to get a general law that changes the 
modulus as little as possible: 

ad = bd (mod m) 

<^=4> a ee b (mod ™ ^ ) , integers a, b, d, m. (4.39) 

For we can multiply ad = bd by d', where d'd + m'm = gcd(d, m); this gives 
the congruence a-gcd(d, m) = b-gcd(d, m) (mod m), which can be divided 
by gcd(d,m). 

Let’s look a bit further into this idea of changing the modulus. If we 
know that a = b (mod 100), then we also must have a = b (mod 10), or 
modulo any divisor of 100. It’s stronger to say that a — b is a multiple of 100 
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than to say that it’s a multiple of 10. In general, 

a = b (mod md) =4 a = b (mod m) , integer d, (4-40) 

because any multiple of md is a multiple of m. 

Conversely, if we know that a = b with respect to two small moduli, can 
we conclude that a = b with respect to a larger one? Yes; the rule is 

a = b (mod m) and a = b (mod n) 

4=4 a = b (mod lcm(m, n)) , integers m, n > 0. (4-41) 

For example, if we know that a = b modulo 1 2 and 1 8, we can safely conclude 

that a = b (mod 36). The reason is that if a — b is a common multiple of m 
and n, it is a multiple of lcm (m, n). This follows from the principle of unique 
factorization. 

The special case m 1 n of this law is extremely important, because 
lcmfm, n) = mn when m and n are relatively prime. Therefore we will state 
it explicitly: 

a = b (mod mn) 

4=4- a = b (mod m) and a = b (mod n) , if m _L n. (4.42) 

For example, a = b (mod 100) if and only if a = b (mod 25) and a = b 

(mod 4). Saying this another way, if we know x mod 25 and x mod 4, then 
we have enough facts to determine x mod 1 00. This is a special case of the 
Chinese Remainder Theorem (see exercise 30), so called because it was 
discovered by Sun Tsu in China, about A.D . 350. 

The moduli m and n in (4.42) can be further decomposed into relatively 
prime factors until every distinct prime has been isolated. Therefore 

a = b (mod m) 4=4 a = b (mod p mp ) for all p , 

if the prime factorization (4.11) of m is Up p mp . Congruences modulo powers 
of primes are the building blocks for all congruences modulo integers. 

4.7 INDEPENDENT RESIDUES 

One of the important applications of congruences is a residue num- 
ber system , in which an integer x is represented as a sequence of residues (or 
remainders) with respect to moduli that are prime to each other: 

Res(x) = (x mod mi , . . . , x mod m r ) , if mj _L mic for 1 ^ j < k ^ r. 

Knowing x mod mi , . . . , x mod m r doesn’t tell us everything about x. But 
it does allow us to determine x mod m, where m is the product mi . . .m r . 


Modulitos? 
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In practical applications we’ll often know that x lies in a certain range; then 
we’ll know everything about x if we know x mod m and if m is large enough. 

For example, let’s look at a small case of a residue number system that 
has only two moduli, 3 and 5: 


x mod 1 5 

x mod 3 

x mod 5 

0 

0 

0 

1 

1 

1 

2 

2 

2 

3 

0 

3 

4 

1 

4 

5 

2 

0 

6 

0 

1 

7 

1 

2 

8 

2 

3 

9 

0 

4 

10 

1 

0 

11 

2 

1 

12 

0 

2 

13 

1 

3 

14 

2 

4 


For example, the 
Mersenne prime 
2 3i _ j 

works well. 


Each ordered pair (x mod 3, x mod 5) is different, because x mod 3 = y mod 3 
and x mod 5 = y mod 5 if and only if x mod 15 = y mod 15. 

We can perform addition, subtraction, and multiplication on the two 
components independently , because of the rules of congruences. For example, 
if we want to multiply 7 = (1,2) by 13 = (1,3) modulo 15, we calculate 
1 • 1 mod 3 = 1 and 2-3 mod 5 = 1. The answer is ( 1 , 1 ) = 1 ; hence 7-13 mod 1 5 
must equal 1 . Sure enough, it does. 

This independence principle is useful in computer applications, because 
different components can be worked on separately (for example, by different 
computers). If each modulus mk is a distinct prime Pk, chosen to be slightly 
less than 2 31 , then a computer whose basic arithmetic operations handle in- 
tegers in the range [— 2 31 ,.2 31 ) can easily compute sums, differences, and 
products modulo pk- A set of r such primes makes it possible to add, sub- 
tract, and multiply “multiple-precision numbers” of up to almost 31 r bits, 
and the residue system makes it possible to do this faster than if such large 
numbers were added, subtracted, or multiplied in other ways. 

We can even do division, in appropriate circumstances. For example, 
suppose we want to compute the exact value of a large determinant of integers. 
The result will be an integer D, and bounds on |D| can be given based on the 
size of its entries. But the only fast ways known for calculating determinants 
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require division, and this leads to fractions (and loss of accuracy, if we resort 
to binary approximations). The remedy is to evaluate D mod p^ = D^, for 
various large primes p^. We can safely divide modulo pi< unless the divisor 
happens to be a multiple of p^. That’s very unlikely, but if it does happen we 
can choose another prime. Finally, knowing for sufficiently many primes, 
we’ll have enough information to determine D. 

But we haven’t explained how to get from a given sequence of residues 
(x mod mi , . . . , x mod m r ) back to x mod m. We’ve shown that this conver- 
sion can be done in principle, but the calculations might be so formidable 
that they might rule out the idea in practice. Fortunately, there is a rea- 
sonably simple way to do the job, and we can illustrate it in the situation 
(x mod 3 ,x mod 5 ) shown in our little table. The key idea is to solve the 
problem in the two cases (1,0) and (0, 1 ); for if (1,0) = a and (0,1) = b, then 
(x,y) = (ax + by) mod 15 , since congruences can be multiplied and added. 

In our case a = 1 0 and b = 6, by inspection of the table; but how could 
we find a and b when the moduli are huge? In other words, if m _L n, what 
is a good way to find numbers a and b such that the equations 

a mod m = 1, a mod n = 0, b mod m = 0, b mod n = 1 

all hold? Once again, (4.5) comes to the rescue: With Euclid’s algorithm, we 
can find m' and n' such that 

m/m + n'n = 1 . 

Therefore we can take a = n'n. and b = m'm, reducing them both mod mn 
if desired. 

Further tricks are needed in order to minimize the calculations when the 
moduli are large; the details are beyond the scope of this book, but they can 
be found in [ 208 , page 274 ]. Conversion from residues to the corresponding 
original numbers is feasible, but it is sufficiently slow that we save total time 
only if a sequence of operations can all be done in the residue number system 
before converting back. 

Let’s firm up these congruence ideas by trying to solve a little problem: 
How many solutions are there to the congruence 

x 2 = 1 (mod m) , (4-43) 

if we consider two solutions x and x' to be the same when x = x'? 

According to the general principles explained earlier, we should consider 
first the case that m is a prime power, p k , where k > 0 . Then the congruence 
x 2 = 1 can be written 

(x — 1 ) ( x — |— 1 ) = 0 (mod p k ) , 
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All primes are odd 
except 2 , which is 
the oddest of ail. 


Mathematicians love 
to say that things 
are trivial. 


so p must divide either x — 1 or x + 1 , or both. But p can’t divide both 
x — 1 and x + 1 unless p = 2; we’ll leave that case for later. If p > 2, then 
p k \(x — 1 ) (x + 1 ) p k \(x — 1 ) or p k \(x + 1 ); so there are exactly two 

solutions, x = +1 and x = — 1 . 

The case p = 2 is a little different. If 2 k \(x — 1 )(x + 1 ) then either x — 1 
or x + 1 is divisible by 2 but not by 4, so the other one must be divisible 
by 2 k_1 . This means that we have four solutions when k 3, namely x = ±1 
and x = 2 k_1 ± 1 . (For example, when p k = 8 the four solutions are x = 1 , 3, 
5, 7 (mod 8 ); it’s often useful to know that the square of any odd integer has 
the form 8 n + 1 .) 

Now x 2 = 1 (mod m) if and only if x 2 = 1 (mod p"’ 1 ’) for all primes p 
with m p > 0 in the complete factorization of m. Each prime is independent 
of the others, and there are exactly two possibilities for xmodp™ p except 
when p = 2. Therefore if m has exactly r different prime divisors, the total 
number of solutions to x 2 = 1 is 2 r , except for a correction when m is even. 
The exact number in general is 

2 r+;[ 8 \m] + [4\m]-[2\m] . ( 4 . 44 ) 

For example, there are four “square roots of unity modulo 12,” namely 1, 5, 
7, and 1 1 . When m = 1 5 the four are those whose residues mod 3 and mod 5 
are ±1, namely (1,1), (1,4), (2, 1), and (2,4) in the residue number system. 
These solutions are 1,4, 11, and 1 4 in the ordinary (decimal) number system. 

4.8 ADDITIONAL APPLICATIONS 

There’s some unfinished business left over from Chapter 3: We wish 
to prove that the m numbers 

0 mod m, n mod m, 2 n mod m, ..., (m-l)nmodm ( 4 - 45 ) 

consist of precisely d copies of the m/ d numbers 

0 , d, 2 d, ..., m— d 

in some order, where d = gcd(m, n). For example, when m = 12 and n = 8 
we have d = 4, and the numbers are 0, 8 , 4, 0, 8 , 4, 0, 8 , 4, 0, 8 , 4. 

The first part of the proof — to show that we get d copies of the first 
m/ d values — is now trivial. We have 

jn = kn (mod m) j(n/d) = k(n/d) (mod m/d) 

by ( 4 . 38 ); hence we get d copies of the values that occur when 0 5$ k < m/d. 
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Now we must show that those m/ d numbers are {0, d, 2 d, . . . , m — d} 
in some order. Let’s write m = m'd and n = n'd. Then kn mod m = 
d(kn' mod m/), by the distributive law (3.23); so the values that occur when 
0 ^ k < m' are d times the numbers 

0 mod m/, n'modm', 2n'mod m', (m/ — 1 )n' mod m/ . 

But we know that m/ _L n' by (4.27); we’ve divided out their gcd. Therefore 
we need only consider the case d = 1, namely the case that m and n are 
relatively prime. 

So let’s assume that min. In this case it’s easy to see that the numbers 
(4.45) are just {0, 1, ..., m — 1} in some order, by using the “pigeonhole 
principle.” This principle states that if m pigeons are put into m pigeonholes, 
there is an empty hole if and only if there’s a hole with more than one pigeon. 
(Dirichlet’s box principle, proved in exercise 3.8, is similar.) We know that 
the numbers (4.45) are distinct, because 

jn = kn (mod m) j = k (mod m) 

when min; this is (4.37). Therefore the m different numbers must fill all the 
pigeonholes 0, 1 , . . . , m — 1 . Therefore the unfinished business of Chapter 3 
is finished. 

The proof is complete, but we can prove even more if we use a direct 
method instead of relying on the indirect pigeonhole argument. If m _L n and 
if a value j £ [0 . . m) is given, we can explicitly compute k £ [0 . . m) such 
that kn mod m = j by solving the congruence 

kn = j (mod m) 

for k. We simply multiply both sides by n', where m'm + n'n = I, to get 

k = jn' (mod m) ; 
hence k = jn' mod m. 

We can use the facts just proved to establish an important result discov- 
ered by Pierre de Fermat in 1640. Fermat was a great mathematician who 
contributed to the discovery of calculus and many other parts of mathemat- 
ics. He left notebooks containing dozens of theorems stated without proof, 
and each of those theorems has subsequently been verified — with the possible 
exception of one that became the most famous of all, because it baffled the 
world’s best mathematicians for 350 years. The famous one, called “Fermat’s 
Last Theorem,” states that 


a n + b n ^ c n 


(4.46) 



4.8 ADDITIONAL APPLICATIONS 131 


NEWS FLASH 


Euler [115] con- 
jectured that 

a 4 + b 4 + cVd 4 , 

but Noam Elkies 
[92] found infinitely 
many solutions in 
August, 1987. 

Now Roger Frye has 
done an exhaustive 
computer search, 
proving (after 
about 110 hours 
on a Connection 
Machine) that the 
only solution with 
d < 1000000 is: 
95800 4 + 217519 4 
+ 414560 4 
= 422481 4 . 


. . laquelle propo- 
sition, si elle est 
vraie, est de tres 
grand usage.” 

— P. de Fermat [121] 


for all positive integers a, b, c, and n, when n > 2. (Of course there are lots 
of solutions to the equations a + b = c and a 2 + b 2 = c 2 .) Andrew Wiles 
culminated many years of research by announcing a proof of (4.46) in 1993; 
his proof is currently being subjected to intense scrutiny. 

Fermat’s theorem of 1640 is much easier to verify. It’s now called Fermat’s 
Little Theorem (or just Fermat’s theorem, for short), and it states that 

ri p_1 = 1 (mod p) , if u _L p. (4-47) 

Proof: As usual, we assume that p denotes a prime. We know that the 
p — 1 numbers n mod p, 2n mod p, . . . , (p — 1 )n mod p are the numbers 1 , 2, 
. . . , p — 1 in some order. Therefore if we multiply them together we get 

n- (2n) ((p-l)n) 

= (n mod p) • (2n mod p) • . . . • ((p — 1 )n mod p) 

= (P-1)!, 


where the congruence is modulo p. This means that 

(p — 1 )! n p_1 = (p — 1 )! (mod p) , 

and we can cancel the (p — 1 )! since it’s not divisible by p. QED. 

An alternative form of Fermat’s theorem is sometimes more convenient: 

n p = u (mod p) , integer u. (4-48) 

This congruence holds for all integers n. The proof is easy: If n 1 p we 
simply multiply (4.47) by n. If not, p\n, so n p = 0 = n. 

In the same year that he discovered (4.47), Fermat wrote a letter to 
Mersenne, saying he suspected that the number 

f n = 2 2 +1 

would turn out to be prime for all rt 0. He knew that the first five cases 
gave primes: 

2 1 +1 = 3; 2 2 + l = 5; 2 4 +1 = 17; 2 8 +1 = 257; 2 16 +1 = 65537; 

but he couldn’t see how to prove that the next case, 2 32 + 1 = 4294967297, 
would be prime. 

It’s interesting to note that Fermat could have proved that 2 32 + 1 is not 
prime, using his own recently discovered theorem, if he had taken time to 
perform a few dozen multiplications: We can set u = 3 in (4.47), deducing 
that 


3 2 ' 2 = 1 (mod 2 32 + 1 ), if 2 32 + 1 is prime. 
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And it’s possible to test this relation by hand, beginning with 3 and squaring 
32 times, keeping only the remainders mod 2 32 + 1 . First we have 3 2 = 9, 
then 3 2 = 81 , then 3 2 = 6561 , and so on until we reach 

3 232 = 3029026160 (mod 2 32 + l). 

The result isn’t 1, so 2 32 + 1 isn’t prime. This method of disproof gives us 
no clue about what the factors might be, but it does prove that factors exist. 
(They are 641 and 6700417, first found by Euler in 1732 [102].) 

If 3 2 had turned out to be 1 , modulo 2 32 + 1 , the calculation wouldn’t 
have proved that 2 32 + 1 is prime; it just wouldn’t have disproved it. But 
exercise 47 discusses a converse to Fermat’s theorem by which we can prove 
that large prime numbers are prime, without doing an enormous amount of 
laborious arithmetic. 

We proved Fermat’s theorem by cancelling (p — 1)! from both sides of a 
congruence. It turns out that (p — 1)! is always congruent to —1, modulo p; 
this is part of a classical result known as Wilson’s theorem: 

(n— 1)! = —1 (mod n) ■$=$■ n is prime, ifri>l. (4.49) 

One half of this theorem is trivial: If n > 1 is not prime, it has a prime 
divisor p that appears as a factor of (n — 1 ) ! , so (n — 1 ) ! cannot be congruent 
to — 1 . (If (n — 1 ) ! were congruent to —1 modulo n, it would also be congruent 
to — 1 modulo p, but it isn’t.) 

The other half of Wilson’s theorem states that (p — 1)! = — 1 (mod p). 
We can prove this half by pairing up numbers with their inverses mod p. If 
n 1 p, we know that there exists n' such that 

n/n = 1 (mod p) ; 

here n' is the inverse of n, and n is also the inverse of n/. Any two inverses 
of n must be congruent to each other, since nn' = nn" implies n' = n". 

Now suppose we pair up each number between 1 and p — 1 with its inverse. 
Since the product of a number and its inverse is congruent to 1 , the product 
of all the numbers in all pairs of inverses is also congruent to 1 ; so it seems 
that (p — 1 )! is congruent to 1. Let’s check, say for p = 5. We get 4! = 24; 
but this is congruent to 4, not 1, modulo 5. Oops — what went wrong? Let’s 
take a closer look at the inverses: 

1' = 1 , 2' = 3, 3' = 2, 4' = 4. 

Ah so; 2 and 3 pair up but 1 and 4 don’t — they’re their own inverses. 

To resurrect our analysis we must determine which numbers are their 
own inverses. If x is its own inverse, then x 2 = I (mod p); and we have 


If this is Fermat’s 
Little Theorem, 
the other one was 
last but not least. 


If p is prime, is p' 
prime prime? 
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already proved that this congruence has exactly two roots when p > 2. (If 
p = 2 it’s obvious that (p — 1 )! = — 1 , so we needn’t worry about that case.) 
The roots are 1 and p — 1 , and the other numbers (between 1 and p — 1 ) pair 
up; hence 

(P — !)! = = -1, 


as desired. 

Unfortunately, we can’t compute factorials efficiently, so Wilson’s theo- 
rem is of no use as a practical test for primality. It’s just a theorem. 


4.9 PHI AND MU 


“Si fuerit N ad x 
numerus primus 
et n numerus 
partium ad N 
primarum, turn 
potestas x n unitate 
minuta semper per 
numerum N erit 
divisibilis. ” 

— L. Euler [111] 


How many of the integers {0, 1 , . . . , m — 1} are relatively prime to m? 
This is an important quantity called cp(m), the “totient” of m (so named by 
J. J. Sylvester [347], a British mathematician who liked to invent new words). 
We have cp(l) = 1, cp(p) = p — 1, and cp(m) < m — 1 for all composite 
numbers m. 

The cp function is called Euler ’s totient function, because Euler was the 
first person to study it. Euler discovered, for example, that Fermat’s theorem 
( 4 . 47 ) can be generalized to nonprime moduli in the following way: 

n <p(m) = ] (mod m) , if n _L m. ( 4 - 5 °) 

(Exercise 32 asks for a proof of Euler’s theorem.) 

If m is a prime power p k , it’s easy to compute cp(m), because u _L 
p k p\n. The multiples of p in {0, 1 ,... ,p k — 1} are {0,p,2p, ... ,p k — p}; 
hence there are p k_1 of them, and cp(p k ) counts what is left: 

(p(p k ) = p k -p k - 1 . 


Notice that this formula properly gives <p(p) = p — 1 when k = 1 . 

If m > 1 is not a prime power, we can write m = mi m 2 where mi _L m 2 . 
Then the numbers 0 ^ n < m can be represented in a residue number system 
as (n mod mi ,n mod m 2 ). We have 


n J_ m 4 =^ n mod mi _L mi and n mod m 2 J_ m 2 

by ( 4 . 30 ) and ( 4 . 4 ). Hence, n mod m is “good” if and only if n mod mi 
and n mod m 2 are both “good,” if we consider relative primality to be a 
virtue. The total number of good values modulo m can now be computed, 
recursively: It is cp(mi )cp(m. 2 ), because there are cp(mi ) good ways to choose 
the first component n mod mi and cp(m 2 ) good ways to choose the second 
component n mod m 2 in the residue representation. 
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For example, cp(12) = cp(4)cp(3) =2-2 =4, because n is prime to 12 if 
and only if n mod 4 — (1 or 3) and n mod 3 = (1 or 2). The four values prime 
to 12 are (1,1), (1,2), (3,1), (3,2) in the residue number system; they are 
1 , 5, 7, 11 in ordinary decimal notation. Euler’s theorem states that n 4 = 1 
(mod 1 2) whenever n _L 1 2. 

A function f(m) of positive integers is called multiplicative if f ( 1 ) = 1 

and 


f(mim2) = f(mi )f(m.2) whenever mi 1 m2. 


(4-5i) 


“Si sint A et B nu- 
meri inter se primi 
et nunierus partium 
ad A primarum 
sit = a, numerus 
vero partium ad B 
primarum sit = b , 
turn numerus par- 
tium ad productum 
AB primarum erit 
= ab.” 

— L. Euler [111] 


We have just proved that cp(m) is multiplicative. We’ve also seen another 
instance of a multiplicative function earlier in this chapter: The number of 
incongruent solutions to x 2 = 1 (mod m) is multiplicative. Still another 
example is f(m) = m“ for any power a. 

A multiplicative function is defined completely by its values at prime 
powers, because we can decompose any positive integer m into its prime- 
power factors, which are relatively prime to each other. The general formula 


f(m) = nf(p m ”), ^ m = ]^[p mp 

P P 


( 4 - 52 ) 


holds if and only if f is multiplicative. 

In particular, this formula gives us the value of Euler’s totient function 
for general m: 


<p(m) = -P mp_1 ) 

P\m 


™nK)- 

p\m 


(4-53) 


For example, cp(12) = (4 — 2)(3 — 1 ) = 12(1 — j)(1 — j). 

Now let’s look at an application of the cp function to the study of rational 
numbers mod 1 . We say that the fraction m/n is basic if 0 ^ m < n. There- 
fore <p(n) is the number of reduced basic fractions with denominator n; and 
the Farey series 9“ n contains all the reduced basic fractions with denominator 
n or less, as well as the non-basic fraction |. 

The set of all basic fractions with denominator 12, before reduction to 
lowest terms, is 


0 1 234567891011 

12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ’ 12 ' 


Reduction yields 

0_L111JL1_7_2351_L 
1 ’ 12’ 6’ 4’ 3’ 12’ 2’ 12’ 3’ 4’ 6’ 12 ’ 
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and we can group these fractions by their denominators: 

o. i. 12 . 13. 15. 1 i i ii 

1 i 2> 3’ 3’ 4> 4> 6 ’ 6 ' 12’ 12> 12> 12‘ 

What can we make of this? Well, every divisor d of 12 occurs as a denomi- 
nator, together with all cp(d) of its numerators. The only denominators that 
occur are divisors of 12. Thus 

<P(1) + <P(2) + cp(3) + cp(4) + cp( 6 ) + cp(12) = 12. 

A similar thing will obviously happen if we begin with the unreduced fractions 
— , — , . . . , i 15 — !- for any m, hence 

Y_ ^(d) = m. ( 4 . 54 ) 

d\m 

We said near the beginning of this chapter that problems in number 
theory often require sums over the divisors of a number. Well, ( 4 . 54 ) is one 
such sum, so our claim is vindicated. (We will see other examples.) 

Now here’s a curious fact: If f is any function such that the sum 

g(m) = Y_ f ( d ) 

d\m 

is multiplicative, then f itself is multiplicative. (This result, together with 
( 4 . 54 ) and the fact that g(m) = m is obviously multiplicative, gives another 
reason why cp(m) is multiplicative.) We can prove this curious fact by in- 
duction on m: The basis is easy because f(l) = g(l) = 1. Let m > 1, and 
assume that f(mi m 2 ) = f(m.i )f(m. 2 ) whenever mi _L m 2 and mi m 2 < m. If 
m = mi m 2 and mi _L m 2 , we have 

g (mi m. 2 ) = f(d) = L L f(di d 2 ) , 

d\mi m2 di\mi d2\m2 

and di _L d 2 since all divisors of mi are relatively prime to all divisors of m 2 . 
By the induction hypothesis, f ( di d 2 ) = f ( di ) f ( d 2 ) except possibly when 
di = mi and d 2 = m 2 ; hence we obtain 

f f(di) -f(mi)f(m 2 ) + f(mim 2 ) 

MiXrri! d 2 \m 2 

= g(mi)g(m 2 ) -f(mi)f(m 2 ) +f(mim 2 ) . 

But this equals g(mi m 2 ) = g(mi )g(m 2 ), so f(mi m 2 ) = f(mi )f(m 2 ). 
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Conversely, if f(m) is multiplicative, the corresponding sum-over-divisors 
function g(m) = is a l wa Y s multiplicative. In fact, exercise 33 

shows that even more is true. Hence the curious fact and its converse are 
both facts. 

The Mobius function p(m), named after the nineteenth-century math- 
ematician August Mobius who also had a famous band, can be defined for all 
integers m 1 by the equation 

Y_ M-(d) = [m = 1 1 (4.55) 

d\m 

This equation is actually a recurrence, since the left-hand side is a sum con- 
sisting of p(m) and certain values of p(d) with d < m. For example, if we 
plug in m = 1, 2, . . . , 12 successively we can compute the first twelve values: 


m 

1 2 3 4 5 6 

7 

8 

9 10 

11 

12 

p(m) 

1-1-10-11 

-1 

0 

0 1 

-1 

0 


Richard Dedekind [ 77 ] and Joseph Liouville [ 251 ] noticed the following 
important “inversion principle” in 1857 : 

g (m) = Y f (d) 4=4 f(m) = Y_ M-(d)g(^j-) • (4-56) 

d\m d\m 

According to this principle, the p function gives us a new way to understand 
any function f(m) for which we know JId\m f(d-)- 

The proof of (4.56) uses two tricks (4.7) and (4.9) that we described near 
the beginning of this chapter: If g(m) = Hd\m then 


Y_ M-(d) g( 

d\m 



Y_ p(^)g(d) 

d\m 

d\m k\d 

L L <)«w 

k\m d\(m/k) 

L L p(d)f(k) 

k\m d\(m/k) 

^ [m/k= 1]f(k) = f(m) . 

k\m 


Now is a good time 
to try warmup 
exercise 11. 


The other half of (4.56) is proved similarly (see exercise 12 ). 
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Depending on how 
fast you read. 


Relation (4.56) gives us a useful property of the Mobius function, and we 
have tabulated the first twelve values; but what is the value of p(m) when 
m is large? How can we solve the recurrence (4.55)? Well, the function 
g(m) = [m=1] is obviously multiplicative — after all, it’s zero except when 
m = 1. So the Mobius function defined by (4.55) must be multiplicative, by 
the curious fact we proved a minute or two ago. Therefore we can figure out 
what p(m.) is if we compute p(p k ). 

When m = p k , (4.55) says that 

p(i ) + i 4 p) + p(p 2 ) + ■ • • + p(p k ) = 0 

for all k 7; 1 , since the divisors of p k are 1 , . . . , p k . It follows that 

p(p) = -1 ; p(p k ) =0 for k > 1. 


Therefore by (4.52), we have the general formula 


P(m) = Y\ p(p m ”) 

p\m 


(-D r , 

0 , 


if m = pip 2 ...p r ; 

if m is divisible by some p 2 . 


That’s p. 

If we regard (4.54) as a recurrence for the function cp(m), we can solve 
that recurrence by applying the Dedekind- Liouville rule (4.56). We get 


<P(m) = Y_ M-(d) ■ (4-58) 

d\m 

For example, 

<p(12) = p(1) ■ 12 + p(2) - 6 + p(3) -4 + p(4) -3 + p(6) - 2 + p(12) • 1 
= 12-6-4 + 0 + 2 + 0 = 4. 


If m is divisible by r different primes, say {pi , . . . ,p r }, the sum (4.58) has 
only 2 r nonzero terms, because the p function is often zero. Thus we can see 
that (4.58) checks with formula (4.53), which reads 


cp(m) 



if we multiply out the r factors (1 — 1/Pj), we get precisely the 2 r nonzero 
terms of (4.58). The advantage of the Mobius function is that it applies in 
many situations besides this one. 
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For example, let’s try to figure out how many fractions are in the Farey 
series fF n . This is the number of reduced fractions in [0 . . 1] whose denomi- 
nators do not exceed n, so it is 1 greater than <J>(ri) where we define 

= Y- ‘PM- (4-59) 

1 ^k^x 


(We must add 1 to O(ri) because of the final fraction j.) The sum in (4.59) 
looks difficult, but we can determine O(x) indirectly by observing that 

= ^L*JL 1 +*J (4-60) 

dStl 

for all real x 5s 0. Why does this identity hold? Well, it’s a bit awesome yet 
not really beyond our ken. There are j LI + X J basic fractions m/n with 
0 ^ in < n ^ x, counting both reduced and unreduced fractions; that gives 
us the right-hand side. The number of such fractions with gcd(m,n) = d 
is 0(x/d), because such fractions are m i'/n' with 0 +( m' < n' ^ x/d after 
replacing m by m'd and n by n'd. So the left-hand side counts the same 
fractions in a different way, and the identity must be true. 

Let’s look more closely at the situation, so that equations (4.59) and 
(4.60) become clearer. The definition of O(x) implies that O(x) = 0([xj); 
but it turns out to be convenient to define O(x) for arbitrary real values, not 
just for integers. At integer values we have the table 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

cp(n) 

- 

1 

1 

2 

2 

4 

2 

6 

4 

6 

4 

10 

4 

O(n) 

0 

1 

2 

4 

6 

10 

12 

18 

22 

28 

32 

42 

46 


and we can check (4.60) when x = 12: 

0(12) + 0(6) + 0(4) + 0(3) + 0(2) + 0(2) + 6-0(1) 

= 46+ 12 + 6 + 4 + 2 + 2 + 6 = 78 = 1-12-13. 

Amazing. 

Identity (4.60) can be regarded as an implicit recurrence for O(x); for 
example, we’ve just seen that we could have used it to calculate 0(12) from 
certain values of O(m) with m < 12. And we can solve such recurrences by 
using another beautiful property of the Mobius function: 

9 (x) = ^f(*/d) <f=+ f(x) = Y_ M-(d)g(x/d) . (4.61) 


(This extension to 
real values is a use- 
ful trick for many 
recurrences that 
arise in the analysis 
of algorithms.) 


In fact, Mobius 
[273] invented his 
function because 
of (4.61), not (4.56). 
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This inversion law holds for all functions f such that ]T k d>1 |f(x/kd)| < oo; 
we can prove it as follows. Suppose g(x) = ZLd>i f( x /d)- Then 

Y U-(d.) g(x/d) = Y p(d) Y f(x/kd) 

djsl djsl k>1 

= Y_ fix/ 111 -) Y. M-fd)[m = kd] 

vrC>. 1 d,k]> 1 

= ^ f(x/m) ^ p(d) = ^ f(x/ra)[m = 1] = f(x) . 

m^l d\m m^l 

The proof in the other direction is essentially the same. 

So now we can solve the recurrence (4.60) for ®(x): 

®(x) = \Y M-( d) L x /dJ L 1 +*/dJ . (4-62) 

z d>1 

This is always a finite sum. For example, 

0(12) = 1(12-13 — 6-7-4-5 + 0 — 2-3 + 2-3 

- 1-2 + 0 + 0 + 1 - 2 — 1-2 + 0 ) 

= 78-21 - 10-3 + 3-1+1-1 = 46. 

In Chapter 9 we’ll see how to use (4.62) to get a good approximation to O(x); 
in fact, we’ll prove a result due to Mertens in 1874 [270], 

3 -> 

O(x) = — yx“ + O(xlogx) . 

7T Z 

Therefore the function O(x) grows “smoothly”; it averages out the erratic 
behavior of cp(k). 

In keeping with the tradition established last chapter, let’s conclude this 
chapter with a problem that illustrates much of what we’ve just seen and that 
also points ahead to the next chapter. Suppose we have beads of u different 
colors; our goal is to count how many different ways there are to string them 
into circular necklaces of length m. We can try to “name and conquer” this 
problem by calling the number of possible necklaces N(m, n). 

For example, with two colors of beads R and B, we can make necklaces 
of length 4 in N (4, 2) = 6 different ways: 

r R + r R + r R + r R + r R + r B + 

RR RR RB BB BB BB 

^R^ ^B V ^B^ ^R V ^B - 7 ^B-^ 

All other ways are equivalent to one of these, because rotations of a necklace 
do not change it. However, reflections are considered to be different; in the 
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r B N 

R R 
I I ■ 

B R 

The problem of counting these configurations was first solved by P. A. Mac- 
Mahon in 1892 [264], 

There’s no obvious recurrence for N(m,n), but we can count the neck- 
laces by breaking them each into linear strings in m ways and considering the 
resulting fragments. For example, when m = 4 and n = 2we get 


case m = 6, for example, 


R R 

| is different from 
R B 


RRRR 

RRBR 

RBBR 

RBRB 

RBBB 

BBBB 


RRRR 

RRRB 

RRBB 

BRBR 

BRBB 

BBBB 


RRRR 

BRRR 

BRRB 

RBRB 

BBRB 

BBBB 


RRRR 

RBRR 

BBRR 

BRBR 

BBBR 

BBBB 


Each of the n m possible patterns appears at least once in this array of 
mN(m, n) strings, and some patterns appear more than once. How many 
times does a pattern do...d m _i appear? That’s easy: It’s the number of 
cyclic shifts n k . . . a m _i do . . . d k _i that produce the same pattern as the orig- 
inal do . . . d m _i . For example, BRBR occurs twice, because the four ways to 
cut the necklace formed from BRBR produce four cyclic shifts (BRBR, RBRB, 
BRBR, RBRB); two of these coincide with BRBR itself. This argument shows 
that 


m.N(m,n) 


- L 

do G S n 

= L 


y [a 0 . . . d m _! = d k . . . d m _! do . . . d k I ] 

0^k<m 

y [d 0 ...d m _i — d k . . . d m _! d 0 . . . d k _i j . 


O^kcm a 0 ,... 1 a m _ 1 eS„ 


Here S n is a set of n different colors. 

Let’s see how many patterns satisfy do . . . d m _i = d k . . . d m _j do . . . d k _i , 
when k is given. For example, if m = 12 and k = 8, we want to count the 
number of solutions to 


dodld2d3d4d5d6d7d 8 d 9 diodn = d 8 d 9 d] 0 dl 1 d 0 d] d 2 d 3 d 4 d 5 d 6 d 7 . 


This means do = o 8 = n 4 ; di = d 9 = ds; d 2 = dio = d6j and d 3 = dn = d 7 . 
So the values of do, di, d 2 , and d 3 can be chosen in n 4 ways, and the 
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remaining a’s depend on them. Does this look familiar? In general, the 
solution to 


CLj = <l(j+k) mod m , for 0 ^ j < 1U 

makes us equate Qj with mod m for 1 = 1,2,...; and we know that 

the multiples of k modulo m are {0, d, 2d, . . . , m — d}, where d = gcd(k, m). 
Therefore the general solution is to choose do, . . . , cid-i independently and 
then to set cq = cij_d for d ^ j < m. There are n d solutions. 

We have just proved that 

mN(m,n) = Y_ n gcd(k ' m) . 

O^kcm 

This sum can be simplified, since it includes only terms n d where d\m. Sub- 
stituting d = gcd(k, m) yields 


N(m, n) 


1 

m 


z 

d\m 


1 

m 


Z 

d\m 


1 

m 


Z 

d\m 


n d Y_ [d = gcd(k,m)] 

0^k<m 

ia d ^ [k/dlm/d] 

O^kcm 

ia d ^ [k_L m/d] . 
O^kcm/ d 


(We are allowed to replace k/d by k because k must be a multiple of d.) 
Finally, we have Xo<k<m/d^^ m/d] = cp(m/d) by definition, so we obtain 
MacMahon’s formula: 


N(m, n) 


1 

m 


n d cp 

d\m 



— cp(d) n m/d . 
m 

d\m 


(4-63) 


When m = 4 and n = 2 , for example, the number of necklaces is ^(1 - 2 4 + 
1 - I 2 + 2 - 2 1 ) = 6, just as we suspected. 

It’s not immediately obvious that the value N(m, n) defined by Mac- 
Mahon’s sum is an integer! Let’s try to prove directly that 

cp(d)n m/d = 0 (mod m) , (4-64) 

d\m 


without using the clue that this is related to necklaces. In the special case 
that m is prime, this congruence reduces to n p + (p — 1 )n = 0 (mod p); that 
is, it reduces to n p = n. We’ve seen in (4.48) that this congruence is an 
alternative form of Fermat’s theorem. Therefore (4.64) holds when m = p; 
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we can regard it as a generalization of Fermat’s theorem to the case when the 
modulus is not prime. (Euler’s generalization (4.50) is different.) 

We’ve proved (4.64) for all prime moduli, so let’s look at the smallest 
case left, m = 4. We must prove that 

n 4 + n 2 + 2n = 0 (mod 4) . 

The proof is easy if we consider even and odd cases separately. If n is even, 
all three terms on the left are congruent to 0 modulo 4, so their sum is too. If 
n is odd, n 4 and n 2 are each congruent to 1 , and 2 n is congruent to 2; hence 
the left side is congruent to 1+1+2 and thus to 0 modulo 4, and we’re done. 

Next, let’s be a bit daring and try m = 12. This value of m ought to 
be interesting because it has lots of factors, including the square of a prime, 
yet it is fairly small. (Also there’s a good chance we’ll be able to generalize a 
proof for 12 to a proof for general m.) The congruence we must prove is 

n 12 + n 6 + 2n 4 + 2n 3 + 2n 2 + 4n = 0 (mod 12). 

Now what? By (4.42) this congruence holds if and only if it also holds mod- 
ulo 3 and modulo 4. So let’s prove that it holds modulo 3. Our congru- 
ence (4.64) holds for primes, so we have n 3 + 2n = 0 (mod 3). Careful 
scrutiny reveals that we can use this fact to group terms of the larger sum: 

n 12 + n 6 + 2n 4 + 2n 3 + 2n 2 + 4n 

= (n 12 + 2n 4 ) + (n 6 + 2n 2 ) + 2(n 3 + In) 

= 0 + 0 + 2-0 = 0 (mod 3). 

So it works modulo 3. 

We’re half done. To prove congruence modulo 4 we use the same trick. 
We’ve proved that n 4 + n 2 + 2n = 0 (mod 4), so we use this pattern to group: 

n 12 + n 6 + 2n 4 + 2n 3 + 2n 2 + 4n 

= (n 12 + n 6 + 2n 3 ) + 2(n 4 + n 2 + In) 

= 0 + 2-0 = 0 (mod 4). 

QED for the case m = 12. 

So far we’ve proved our congruence for prime m, for m = 4, and for m = 
12. Now let’s try to prove it for prime powers. For concreteness we may 
suppose that m = p 3 for some prime p. Then the left side of (4.64) is 

n p3 + cp(p)n p ~ + cp(p 2 )n p + cp(p 3 )u 

= n p3 + (p - 1 )n p ~ + (p 2 - p)n p + (p 3 - p 2 )n 
= (n p3 - n p2 ) + p (n p2 - n p ) + p 2 (n p - n) + p 3 n . 


QED: Quite Easily 
Done. 
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We can show that this is congruent to 0 modulo p 3 if we can prove that 
n p — n p ~ is divisible by p 3 , that n p ~ — n p is divisible by p 2 , and that n p — n 
is divisible by p, because the whole thing will then be divisible by p 3 . By the 
alternative form of Fermat’s theorem we have n p = n (mod p), so p divides 
n p — n; hence there is an integer q such that 

n p = n + pq . 

Now we raise both sides to the pth power, expand the right side according to 
the binomial theorem (which we’ll meet in Chapter 5), and regroup, giving 

n p2 = (n + pq) p = n p + (pq) 1 ^ 1 (^) + (pq) 2 n p " 2 Q + ••• 

= n p + p 2 Q 

for some other integer Q. We’re able to pull out a factor of p 2 here because 
( p ) = p in the second term, and because a factor of (pq) 2 appears in all the 
terms that follow. So we find that p 2 divides n p — n p . 

Again we raise both sides to the pth power, expand, and regroup, to get 

n p3 = (n p +p 2 Q) p 

= u p2 + (p 2 Q) 1 n p(p - 1) ^ + (p 2 Q) 2 n p(p - 2) Q + ... 

= u p2 + p 3 Q 

for yet another integer Q. So p 3 divides n p3 — n p . This finishes the proof 
for m = p 3 , because we’ve shown that p 3 divides the left-hand side of (4.64). 
Moreover we can prove by induction that 

n pk = n pk ~' + p k O 

for some final integer O. (final because we’re running out of fonts); hence 
n p = n p (mod p k ), for k > 0. (4-65) 

Thus the left side of (4.64), which is 

(n pk — n pk_1 ) + p(n pk_1 — n pk_2 ) + ••• + p k - 1 (n p -n) + p k n, 

is divisible by p k and so is congruent to 0 modulo p k . 

We’re almost there. Now that we’ve proved (4.64) for prime powers, all 
that remains is to prove it when m = mi m2, where mi _L m2, assuming that 
the congruence is true for mi and m2. Our examination of the case m = 12, 
which factored into instances of m = 3 and m = 4, encourages us to think 
that this approach will work. 
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We know that the cp function is multiplicative, so we can write 
^(p(d)n m/d = ^ cpfdi dz) n m ’ mi/d ' d - 

d\m di\mi,d 2 \m 2 

= ^ cp(d!)( Y. (P(d2)(n m ’ /d ') m2/d2 V 

di\rai V d2\m.2 

But the inner sum is congruent to 0 modulo m2, because we’ve assumed that 
(4.64) holds for m2; so the entire sum is congruent to 0 modulo m2. By a 
symmetric argument, we find that the entire sum is congruent to 0 modulo mi 
as well. Thus by (4.42) it’s congruent to 0 modulo m. QED. 


Exercises 

Warmups 

1 What is the smallest positive integer that has exactly k divisors, for 

1 <c k <; 6? 

2 Prove that gcd(m, n) -lcm(m, n) = m-n, and use this identity to express 
lcmfm, n) in terms of lcm(n mod m, m), when n mod m/0. Hint: Use 
(4.12), (4.14), and (4.15). 

3 Let 7t(x) be the number of primes not exceeding x. Prove or disprove: 

7t(x) — 7t(x — 1 ) = [x is prime] . 

4 What would happen if the Stern-Brocot construction started with the 

five fractions (y, ^y, y) instead of with (y, ^)? 

5 Find simple formulas for L k and R k , when L and R are the 2x2 matrices 
of (4-33)- 

6 What does ‘ a = b (mod 0) ’ mean? 

7 Ten people numbered 1 to 10 are lined up in a circle as in the Josephus 
problem, and every mth person is executed. (The value of m may be 
much larger than 10.) Prove that the first three people to go cannot be 
1 0, k, and k + 1 (in this order) , for any k. 

8 The residue number system (x mod 3, x mod 5) considered in the text has 
the curious property that 1 3 corresponds to (1,3), which looks almost the 
same. Explain how to find all instances of such a coincidence, without 
calculating all fifteen pairs of residues. In other words, find all solutions 
to the congruences 

lOx + y = x (mod 3) , lOx + y = y (mod 5) . 

Hint: Use the facts that 10u+6v = u (mod 3) and 10u+6v = v (mod 5). 
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9 Show that (3 77 — 1 )/2 is odd and composite. Hint: What is 3 77 mod 4? 

10 Compute c p(999). 

11 Find a function cr(n) with the property that 

g(n) = Y_ f ( k ) ^ f ( n ) = Y ff ( k )g( n - k )- 

OsCk^n O^k^n 

(This is analogous to the Mobius function; see (4.56).) 

12 Simplify the formula I d \mlk\d^ k ) g ( d./lc) . 

13 A positive integer n is called squarefree if it is not divisible by m 2 for 
any m > 1 . Find a necessary and sufficient condition that n is squarefree, 
a in terms of the prime-exponent representation (4.11) of n; 

b in terms of p.(n). 

Basics 

14 Prove or disprove: 

a gcd(km, kn) = kgcd(m, n) ; 
b lcm(km,kn) = klcm(m, n) . 

15 Does every prime occur as a factor of some Euclid number e n ? 

16 What is the sum of the reciprocals of the first n Euclid numbers? 

17 Let f n be the “Fermat number” 2 2 +1. Prove that f m _L f n if m < n. 

18 Show that if 2 n + 1 is prime then n is a power of 2. 

19 Prove the following identities when n is a positive integer: 


Y 

1 ^k<n 


<p(k+ 1) 
k 


L 

1 <m<n 


Y L( m / k )/rmAlJ 


n 


-1-Z 


1 ^k<m 

(k — 1 )! + 1 


k=l 


Hint: This is a trick question and the answer is pretty easy. 

20 For every positive integer n there’s a prime p such that n < p ^ 2n. (This 
is essentially “Bertrand’s postulate,” which Joseph Bertrand verified for 
n < 3000000 in 1845 and Chebyshev proved for all n in 1850.) Use 
Bertrand’s postulate to prove that there’s a constant b « 1 .25 such that 
the numbers 

L2 b j, L2 2b j, L2 22b J, ... 


are all prime. 
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21 Let P n be the nth prime number. Find a constant K such that 

L(10 n2 K) mod 10 n j = P n . 

22 The number 1111111111111111111 is prime. Prove that, in any radix b, 
(11 . . . 1 )b can be prime only if the number of 1 ’s is prime. 

23 State a recurrence for p(k), the ruler function in the text’s discussion of 
£2(11!). Show that there’s a connection between p(k) and the disk that’s 
moved at step k when an n-disk Tower of Hanoi is being transferred in 
2 n - 1 moves, for 1 sg k <C 2 n - 1 . 

24 Express e p (n!) in terms of v p (n), the sum of the digits in the radix p 
representation of n, thereby generalizing (4.24). 

25 We say that m exactly divides n, written m\\n, if m\n and m J_ n/m. 
For example, in the text’s discussion of factorial factors, p £p ^ n!) \\n!. 
Prove or disprove the following: 

a k\\n and m\\n km\\n, if k _L m. 
b For all m, n > 0, either gcd(m, n)\\m or gcd(m, n)\\n. 

26 Consider the sequence Sn of all nonnegative reduced fractions m/n such 
that mn g N . For example, 

o _ 0 1 111111121213253456789 10 

310 — 1>10>9’8>7>6>5>4>3>5>2>3’1>2>1’2>1>1’1’1>1>1>1> 1 ■ 

Is it true that m'n — mn' = 1 whenever m/n immediately precedes 
m//n' in Sn? 

27 Give a simple rule for comparing rational numbers based on their repre- 
sentations as L’s and R’s in the Stern-Brocot number system. 

28 The Stern-Brocot representation of 7t is 

71 = R 3 L 7 R 15 LR 292 LRLR 2 LR 3 LR 14 L 2 R... ; 

use it to find all the simplest rational approximations to 7t whose denom- 
inators are less than 50. Is ^ one of them? 

29 The text describes a correspondence between binary real numbers x = 
(,bi b2b3 ... (2 i n [0 . . 1 ) and Stern-Brocot real numbers oc = Bi B2B3 . . . 
in [0. .00). If x corresponds to oc and x / 0, what number corresponds 
to 1 — x? 

30 Prove the following statement (the Chinese Remainder Theorem): Let 
mi , . . . , m r be integers with mj _L mi< for 1 g j < k g r; let m = 
mi . . . m r ; and let Qi , . . . , a r , A be integers. Then there is exactly one 
integer a such that 


Is this a test for 
strabismus? 


Look, ma, 
sideways addition. 
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Why is “Euler" 
pronounced “Oiler” 
when “Euclid” is 
“Yooklid”? 


31 A number in decimal notation is divisible by 3 if and only if the sum of 
its digits is divisible by 3. Prove this well-known rule, and generalize it. 

32 Prove Euler’s theorem (4.50) by generalizing the proof of (4.47). 

33 Show that if f(m) and g(m) axe multiplicative functions, then so is 
h.(m) = Ld\m f ( d ) 9( m /d). 

34 Prove that (4.56) is a special case of (4.61). 

Homework exercises 

35 Let I(m, n) be a function that satisfies the relation 

I(m, n)m + I(n, m)n = gcd(m,n), 

when m and n are nonnegative integers with m ^ n. Thus, I(m, n) = m' 
and I(n,m) = n' in (4.5); the value of I(m, n) is an inverse of m with 
respect to n. Find a recurrence that defines I(m, n). 

36 Consider the set Z(vTO) = {m + nv^O | integer m, n}. The number 
m + nv 1 0 is called a unit if m 2 — 1 On 2 = ± 1 , since it has an inverse 
(that is, since ( m + nvTO ) • ± ( m — nvTO ) = 1 ) . For example, 3 + vTO is 
a unit, and so is 1 9 — 6\/ 10. Pairs of cancelling units can be inserted into 
any factorization, so we ignore them. Nonunit numbers of Z(vlO) are 
called prime if they cannot be written as a product of two nonunits. Show 
that 2, 3, and 4 ± ^/^0 are primes of Z(\/T0). Hint: If 2 = (k + 1\/T0 ) x 
(m + n\/T0 ) then 4 = (k 2 — 101 2 )(m 2 — 10n 2 ). Furthermore, the square 
of any integer mod 10 is 0, 1 , 4, 5, 6, or 9. 

37 Prove (4.17). Hint: Show that e n — j — (e n _i — j) 2 + and consider 
2~ n log(e n — j). 

38 Prove that if a _L b and a > b then 

gcd(a m - b m , a n - b n ) = a gcd,m ' n) - b gcd(m ’ n) , 0 < m < n. 

(All variables are integers.) Hint: Use Euclid’s algorithm. 

39 Let S(m) be the smallest positive integer n for which there exists an 
increasing sequence of integers 

m = aj < a2 < • • • < a t = n 

such that ai a2 ... at is a perfect square. (If m is a perfect square, we 
can let t = 1 and n = m.) For example, S (2) =6 because the best such 
sequence is aj = 2, 02 = 3, 03 = 6. We have 


n 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

S(n) 

1 

6 

8 

4 

10 

12 

14 

15 

9 

18 

22 

20 


Prove that S(m) 7^ S(m') whenever 0 < m < m'. 
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40 If the radix p representation of n is (a m . . . ai Qo) p , prove that 

n!/ pep (n!) = (-1) e * (n!) a m !. .. Qi i Qo i (mod p) . 

(The left side is simply n! with all p factors removed. When n. = p this 
reduces to Wilson’s theorem.) 

41 a Show that if p mod 4 = 3, there is no integer n such that p divides 

n 2 + 1 . Hint: Use Fermat’s theorem, 
b But show that if p mod 4 = 1, there is such an integer. Hint: Write 
(p — 1 )! as (nLV ^ 2 k(p — k)) and think about Wilson’s theorem. 

42 Consider two fractions m/n and m'/n' in lowest terms. Prove that when 
the sum m/n+m'/n' is reduced to lowest terms, the denominator will be 
nn' if and only if n _L n'. (In other words, (mn'+m'n.)/nn' will already 
be in lowest terms if and only if n and n' have no common factor.) 

43 There are 2 k nodes at level k of the Stern-Brocot tree, corresponding to 
the matrices L k , L k_1 R, . . . , R k . Show that this sequence can be obtained 
by starting with L k and then multiplying successively by 

(0 -1 \ 

VI 2p(n) + l ) 

for 1 ^ n < 2 k , where p(n) is the ruler function. 

44 Prove that a baseball player whose batting average is .316 must have 
batted at least 19 times. (If he has m hits in n times at bat, then 
m/ne [0.31 55.. 0.31 65).) 

45 The number 9376 has the peculiar self-reproducing property that 

93 76 2 = 87909376. 

How many 4-digit numbers x satisfy the equation x 2 mod 10000 = x? 
How many n-digit numbers x satisfy the equation x 2 mod 1 0 n = x? 

46 a Prove that if n’ = 1 and n k = 1 (mod m), then n gcd,i ' k * = 1. 

b Show that 2 n ^ 1 (mod n), if n > 1 . Hint: Consider the least prime 

factor of n. 

47 Show that if n m_1 = 1 (mod m) and if ^ 1 (mod m) for all 

primes such that p\(m — 1 ), then m is prime. Hint: Show that if this 
condition holds, the numbers n k mod m are distinct, for 1 ^ k < m. 

48 Generalize Wilson’s theorem (4.49) by ascertaining the value of the ex- 
pression (ni $ n<m,n_Lm n ) mod m > when m > 1- 


Wilson’s theorem: 
“Martha, that boy is 
a menace.” 


Radio announcer: 

. . pitcher Mark 
LeChiffre hits a 
two-run single ! 
Mark, who was 
batting . 080 , gets 
his second hit of 
the year.” 

Anything wrong? 
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What are the roots 
of disunity? 


49 Let R(N) be the number of pairs of integers (m,n) such that 0 5^ m < N, 
0 ^ n < N , and min. 

a Express R(N) in terms of the ® function, 
b Prove that R(N) = 2Z dS;1 [N/dJ 2 p(d). 

50 Let m be a positive integer and let 

eu = e lm/m = cos(27t/m) + isin(27t/m) . 


We say that eu is an mth root of unity , since cu m = e 2m — 1. In fact, 
each of the m complex numbers tu°, eu 1 , . . . , cu m_1 is an mth root of 
unity, because (cu k ) m = e 2nkl = 1; therefore z— cu k is a factor of the 
polynomial z m — 1 , for 0 ^ k < m. Since these factors are distinct, the 
complete factorization of z m — 1 over the complex numbers must be 


z m -1 = Y[ (z-cu k ). 

0 <:k<m 

a Let V m (z) = Ilo<:k<m,k 4 m( z ~ w*)- ( This polynomial of degree 
cp(m) is called the cyclotomic polynomial of order m.) Prove that 

z m -i = 

d\m 


b Prove that ^(z) = Ild\m( zd ~ 1 

Exam problems 

51 Prove Fermat’s theorem (4.48) by expanding (1 +1 + • • • + 1 ) p via the 
multinomial theorem. 

52 Let n and x be positive integers such that x has no divisors ^ n (except 1 ), 
and let p be a prime number. Prove that at least ( n /pJ °f the numbers 
{x — 1 , x 2 — 1 , . . . , x n ~' — 1} are multiples of p. 

53 Find all positive integers n such that n \ [(n — 1 )!/(n+1)j. 

54 Determine the value of 1000! mod 10 250 by hand calculation. 

55 Let P n be the product of the first n factorials, K([£ =1 k!. Prove that 
P 2n/Pn i s an integer, for all positive integers n. 

56 Show that 


2 n — 1 

| | ^min(k,2n— k) 
k=1 


]^[( 2k + l) 2n - 2k - 1 

k=1 


is a power of 2. 
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57 Let S(m, n) be the set of all integers k such that 

m mod k + n mod k ^ k . 

For example, S( 7, 9) ={2,4, 5,8, 10, 11, 12, 13, 14, 15, 16}. Prove that 
cp(k) = mn. 

k£S(m,n) 

Hint: Prove first that Ii $m «nld\m ( ( , ( d ) = Ld^i cpfd) L'n./dJ . Then 
consider [(m. + n)/dj — [m./ d.J — |n/dj. 

58 Let f(m) = Xid\m d- Find a necessary and sufficient condition that f(m) 
is a power of 2 . 

Bonus problems 

59 Prove that if Xi , . . . , x n are positive integers with 1 /xi + • • • + 1 /x n = 1 , 

then max(xi , . . . , x n ) < e n . Hint: Prove the following stronger result by 
induction: “If 1 * ■ ■ ■ - l/x n I 1/a — 1, where xi , . . . , x n are positive 

integers and cc is a rational number max(xi , . . . ,x n ), then ot+ 1 <: e n+ i 
and X] . . . x n (a + 1) ^ ei . . . e n e n+ i.” (The proof is nontrivial.) 

60 Prove that there’s a constant P such that ( 4 . 18 ) gives only primes. You 
may use the following (highly nontrivial) fact: There is a prime between 
p and p + p 0 , for all sufficiently large p, if 0 > y)-. 

61 Prove that if m/n, m/ /n' , and m" /n" are consecutive elements of Tn, 
then 

m" = [(n + N)/n'J m/ — m , 
n" = L( n + N )/n'Jn'-n. 

(This recurrence allows us to compute the elements of Tn in order, start- 
ing with j and ^-.) 

62 What binary number corresponds to e, in the binary <-4 Stern-Brocot 
correspondence? (Express your answer as an infinite sum; you need not 
evaluate it in closed form.) 

63 Using only the methods of this chapter, show that if Fermat’s Last The- 
orem ( 4 . 46 ) were false, the least n. for which it fails would have to be 
prime. (You may assume that ( 4 . 46 ) holds when n = 4.) Furthermore, 
if a p + b p = c p is the smallest counterexample, show that 

( m p , if p\c, 

a + b = < ’ ’ 

\ p p m p , if p\c, 

for some integer m. Thus c ^ m p /2 must be really huge. Hint: Let 
x = a + b, and note that gcd(x, (a p + (x — a) p )/x) = gcd(x, pa p_1 ). 
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64 The Peirce sequence Tn of order N is an infinite string of fractions 
separated by '<’ or '=’ signs, containing all the nonnegative fractions 
m/n with m ^ 0 and n sj N (including fractions that are not reduced). 
It is defined recursively by starting with 


3>i = 


0 < 1 < 2 < 3<4 5 < 6 < 7 < 8 < 9 < 10 < . 


For N ^ 1 , we form Tn+i by inserting two symbols just before the kNth 
symbol of Tn, for all k > 0. The two inserted symbols are 

k — 1 

— = , if kN is odd; 

N + 1 

k — 1 

^N.kN N 1 , if kN is even. 

Here Tnj denotes the jth symbol of Tn, which will be either “<’ or “=’ 
when ) is even; it will be a fraction when j is odd. For example, 


?2 = 
?3 = 
?4 = 
?5 = 

■P 6 = 


0_0 .1 . 2_1 . 3 ^ 4_2 .5 . 6 _ 3^7 . 8_4 .9 . 10_5 . 
2 ~ 1 ^ 2 ^ 2 — 1 ^ 2 ^ 2 — 1 ^ 2 ^ 2 ~ 1 ^ 2 ^ 2 — 1 ^ 2 ^ 2 — 1 ^ " 

0 - 0 - 0 < l < l < 2 < 2 ^ 3 _ l <4 3<5 4 _ 6_2 7 5 

2 “ 3 ~ 1 ^ 3 ^ 2 ^ 3 ^ 2 ~ 3 ~ 1 ^ 3 ^ 2 ^ 3 ^ 2 — 3 ~ 1 ^ 3 ^ 2 ^ 

0 = 0 = 0 = 0 < 1 < 1<2 = 1 < 2 < 3 < 2 = 4 =; 3 = 1 < 5 < 4 < 6 = .. . 
243 1 ^ 4 ^ 3^4 2 ^ 3 ^ 4^2 4 3 1 ^ 4 ^ 3^4 

0 _ 0 _ 0 _ 0_0 . 1^2 . 2_1 .2 .2 .3 .4 . 2 _ 4 _ 

2~4 — 5 — 3 ~ 1 ^ 5 < ^ 4 ^ 3 < ^ 5^4 — 2 ^ 5 ^ 3 ^' 4 ^ 5 < ^2 — 4 ""' 

0_0_0_0_0_0^1^1^1 <r 2_1^2-2_3_1^3^4_ 

2 4 6 5 3 ! < ^ 6 ^ 5 < ^ 4 <k 6 3 ^ 5 ^ 4 6 2 ^ 5 ^ 6 


(Equal elements occur in a slightly peculiar order.) Prove that the *<’ 
and '=’ signs defined by the rules above correctly describe the relations 
between adjacent fractions in the Peirce sequence. 

Research problems 

65 Are the Euclid numbers e n all squarefree? 

66 Are the Mersenne numbers 2 P — 1 all squarefree? 

67 Prove or disprove that maxi <cj<k<cn Qk/gcd( cq , ak) ^ n, for all sequences 
of integers 0 < ai < • • • < a n . 

68 Is there a constant Q such that LQ 2 ' J is prime for all n^> 0? 

69 Let P n denote the nth prime. Prove or disprove that P n+ i — P n = 
0(logP n ) 2 . 

70 Does e 3 (n!) = 62 (n !)/2 for infinitely many n? 

71 Prove or disprove: If k ^ 1 there exists n > 1 such that 2 n = k (mod n). 
Are there infinitely many such n? 

72 Prove or disprove: For all integers a, there exist infinitely many n such 
that cp(n)\(n+a). 
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73 If the CD(n) + 1 terms of the Parey series 

= <T n (0),T n (1 T n (0(n))) 

were fairly evenly distributed, we would expect fF n (lc) ss k/CD(n). There- 
fore the sum D(n) = — k/CP(n)| measures the “deviation 

of lF n from uniformity.” Is it true that D(n) = 0(n'/ 2+e ) for all e > 0? 

74 Approximately how many distinct values are there in the set {0! mod p, 
1 ! mod p, . . . , (p — 1 )! mod p}, as p — > oo? 
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Binomial Coefficients 


LET’S TAKE A BREATHER. The previous chapters have seen some heavy 
going, with sums involving floor, ceiling, mod, phi, and mu functions. Now 
we’re going to study binomial coefficients, which turn out to be (a) more 
Lucky us! important in applications, and (b) easier to manipulate, than all those other 

quantities. 


5.1 BASIC IDENTITIES 


Otherwise known 
as combinations of 
n things, k at a 
time. 


The symbol (£) is a binomial coefficient, so called because of an im- 
portant property we look at later this section, the binomial theorem. But we 
read the symbol “n choose k.” This incantation arises from its combinatorial 
interpretation — it is the number of ways to choose a k-element subset from 
an ri-element set. For example, from the set {1,2, 3, 4} we can choose two 
elements in six ways, 

{1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}; 



To express the number (£) in more familiar terms it’s easiest to first 
determine the number of k-element sequences, rather than subsets, chosen 
from an n-element set; for sequences, the order of the elements counts. We 
use the same argument we used in Chapter 4 to show that n! is the number 
of permutations of n objects. There are n choices for the first element of the 
sequence; for each, there are n— 1 choices for the second; and so on, until there 
are n— k+ 1 choices for the kth. This gives n(n— 1 ) . . . (n— k+ 1 ) = n- choices 
in all. And since each k-element subset has exactly k! different orderings, this 
number of sequences counts each subset exactly k! times. To get our answer, 
we simply divide by k!: 


/n\ _ n(n — 1 ) . . . (n — k + 1 ) 
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For example, 



this agrees with our previous enumeration. 

We call n the upper index and k the lower index. The indices are 
restricted to be nonnegative integers by the combinatorial interpretation, be- 
cause sets don’t have negative or fractional numbers of elements. But the 
binomial coefficient has many uses besides its combinatorial interpretation, 
so we will remove some of the restrictions. It’s most useful, it turns out, 
to allow an arbitrary real (or even complex) number to appear in the upper 
index, and to allow an arbitrary integer in the lower. Our formal definition 
therefore takes the following form: 



f r(r— 1 )...(r — k+ 1 ) 

l k(k— n...m 

l 0 , 



integer k ^ 0; 


integer k < 0. 


( 5-0 


This definition has several noteworthy features. First, the upper index is 
called r, not n; the letter r emphasizes the fact that binomial coefficients make 
sense when any real number appears in this position. For instance, we have 
( 3 )= (— 1 )(— . 2)(— 3)/(3 • 2 • 1 ) = — 1. There’s no combinatorial interpretation 
here, but r = — 1 turns out to be an important special case. A noninteger 
index like r = — 1/2 also turns out to be useful. 

Second, we can view (£) as a kth-degree polynomial in r. We’ll see that 
this viewpoint is often helpful. 

Third, we haven’t defined binomial coefficients for noninteger lower in- 
dices. A reasonable definition can be given, but actual applications are rare, 
so we will defer this generalization to later in the chapter. 

Final note: We’ve listed the restrictions ‘integer k / 0’ and ‘integer 
k < 0’ at the right of the definition. Such restrictions will be listed in all 
the identities we will study, so that the range of applicability will be clear. 
In general the fewer restrictions the better, because an unrestricted identity 
is most useful; still, any restrictions that apply are an important part of 
the identity. When we manipulate binomial coefficients, it’s easier to ignore 
difficult-to-remember restrictions temporarily and to check later that nothing 
has been violated. But the check needs to be made. 

For example, almost every time we encounter (™) it equals 1, so we can 
get lulled into thinking that it’s always 1 . But a careful look at definition (5.1) 
tells us that (™) is 1 only when n ^ 0 (assuming that n is an integer); when 
n < 0 we have ())) = 0. Traps like this can (and will) make life adventuresome. 
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Binomial coefficients 
were well known 
in Asia, many cen- 
turies before Pascal 
was born [90], but 
he had no way to 
know that. 


In Italy it’s called 
Tartaglia’s triangle. 


Before getting to the identities that we will use to tame binomial coeffi- 
cients, let’s take a peek at some small values. The numbers in Table 155 form 
the beginning of Pascal’s triangle , named after Blaise Pascal (1623-1662) 


Table 155 Pascal’s triangle. 



because he wrote an influential treatise about them [285]. The empty entries 
in this table are actually 0 ’s, because of a zero in the numerator of ( 5 . 1 ); for 
example, ( 2 ) = (1 • 0)/(2 ■ I ) =0. These entries have been left blank simply to 
help emphasize the rest of the table. 

It’s worthwhile to memorize formulas for the first three columns, 




0 = ^ 


( 5 - 2 ) 


these hold for arbitrary reals. (Recall that ( n 0) = jn(n+ 1) is the formula 
we derived for triangular numbers in Chapter 1; triangular numbers are con- 
spicuously present in the (’)) column of Table 155.) It’s also a good idea to 
memorize the first five rows or so of Pascal’s triangle, so that when the pat- 
tern 1 , 4, 6, 4, 1 appears in some problem we will have a clue that binomial 
coefficients probably lurk nearby. 

The numbers in Pascal’s triangle satisfy, practically speaking, infinitely 
many identities, so it’s not too surprising that we can find some surprising 
relationships by looking closely. For example, there’s a curious “hexagon 
property,” illustrated by the six numbers 56, 28, 36, 120, 210, 126 that sur- 
round 84 in the lower right portion of Table 155. Both ways of multiplying 
alternate numbers from this hexagon give the same product: 56-36-210 = 
28-120-126 = 423360. The same thing holds if we extract such a hexagon 
from any other part of Pascal’s triangle. 
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And now the identities. Our goal in this section will be to learn a few 
simple rules by which we can solve the vast majority of practical problems 
involving binomial coefficients. 

Definition (5.1) can be recast in terms of factorials in the common case 
that the upper index r is an integer, n, that’s greater than or equal to the 
lower index k: 

(k) = idi^W! • integers n 5 k > °- < 53 > 

To get this formula, we just multiply the numerator and denominator of (5.1) 
by (n — k)!. It’s occasionally useful to expand a binomial coefficient into this 
factorial form (for example, when proving the hexagon property). And we 
often want to go the other way, changing factorials into binomials. 

The factorial representation hints at a symmetry in Pascal’s triangle: 
Each row reads the same left-to-right as right-to-left. The identity reflecting 
this — called the symmetry identity — is obtained by changing k to n — k: 


“C’est une chose 
estrange combien 
il est fertile en 
proprietez. ” 

— B. Pascal [285] 





integer n ]> 0, 
integer k. 


(5-4) 


This formula makes combinatorial sense, because by specifying the k chosen 
things out of n we’re in effect specifying the n — k unchosen things. 

The restriction that n. and k be integers in identity (5.4) is obvious, since 
each lower index must be an integer. But why can’t n be negative? Suppose, 
for example, that n = — 1. Is 



a valid equation? No. For instance, when k = 0 we get 1 on the left and 0 on 
the right. In fact, for any integer k ^ 0 the left side is 

(-J) - 

which is either 1 or —1; but the right side is 0, because the lower index is 
negative. And for negative k the left side is 0 but the right side is 



which is either 1 or —1. So the equation ‘(^ k 1 ) = ( j 1 k ) ’ is always false! 

The symmetry identity fails for all other negative integers n, too. But 
unfortunately it’s all too easy to forget this restriction, since the expression 
in the upper index is sometimes negative only for obscure (but legal) values 
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I just hope I don’t 
fall into this trap 
during the midterm. 


of its variables. Everyone who’s manipulated binomial coefficients much has 
fallen into this trap at least three times. 

But the symmetry identity does have a big redeeming feature: It works 
for all values of k, even when k < 0 or k > n. (Because both sides are zero in 
such cases.) Otherwise 0 Sj k ^ n, and symmetry follows immediately from 
(5-3): 


/ n\ _ n! _ n! / n \ 

Vk/ k!(n-k)! (n-(n-k))! (n-k)! \n — k/' 

Our next important identity lets us move things in and out of binomial 
coefficients: 


r — 1 
k- 1 


integer k ^ 0. 


(5-5) 


The restriction on k prevents us from dividing by 0 here. We call (5.5) 
an absorption identity, because we often use it to absorb a variable into a 
binomial coefficient when that variable is a nuisance outside. The equation 
follows from definition (5.1), because r- = r(r — 1)^1 and k! = k(k — 1)! 
when k > 0; both sides are zero when k < 0. 

If we multiply both sides of (5.5) by k, we get an absorption identity that 
works even when k = 0: 

integer k. (5-6) 

This one also has a companion that keeps the lower index intact: 

integer k. (5.7) 


(r-k) 


= r 


r- 1 
k 



We can derive (5.7) by sandwiching an application of (5.6) between two ap- 
plications of symmetry: 


(r-k) 


= (r-k) 


= r 


= r 


r-k 

r- 1 
r - k - 1 

r - 1 
k 


(by symmetry) 
(by (5-6)) 

(by symmetry) 


But wait a minute. We’ve claimed that the identity holds for all real r, 
yet the derivation we just gave holds only when r is a positive integer. (The 
upper index r — 1 must be a nonnegative integer if we’re to use the symmetry 
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property (5.4) with impunity.) Have we been cheating? No. It’s true that 
the derivation is valid only for positive integers r; but we can claim that the 
identity holds for all values of r, because both sides of (5.7) are polynomials 
in r of degree k + 1 . A nonzero polynomial of degree d or less can have at 
most d distinct zeros; therefore the difference of two such polynomials, which 
also has degree d or less, cannot be zero at more than d points unless it is 
identically zero. In other words, if two polynomials of degree d or less agree 
at more than d points, they must agree everywhere. We have shown that 
(r — k)(£) = tC^ 1 ) whenever r is a positive integer; so these two polynomials 
agree at infinitely many points, and they must be identically equal. 

The proof technique in the previous paragraph, which we will call the 
polynomial argument, is useful for extending many identities from integers 
to reals; we’ll see it again and again. Some equations, like the symmetry 
identity (5.4), are not identities between polynomials, so we can’t always use 
this method. But many identities do have the necessary form. 

For example, here’s another polynomial identity, perhaps the most im- 
portant binomial identity of all, known as the addition formula : 

(0 = CkVG-')' integerk - (5 - s) 


When r is a positive integer, the addition formula tells us that every number 
in Pascal’s triangle is the sum of two numbers in the previous row, one directly 
above it and the other just to the left. And the formula applies also when r 
is negative, real, or complex; the only restriction is that k be an integer, so 
that the binomial coefficients are defined. 

One way to prove the addition formula is to assume that r is a positive 
integer and to use the combinatorial interpretation. Recall that (£) is the 
number of possible k-element subsets chosen from an r-element set. If we 
have a set of r eggs that includes exactly one bad egg, there are (£) ways to 
select k of the eggs. Exactly ( r jj 1 ) of these selections involve nothing but good 
eggs; and ( k , ) of them contain the bad egg, because such selections have 
k— 1 of the r — 1 good eggs. Adding these two numbers together gives (5.8). 
This derivation assumes that r is a positive integer, and that k ^ 0. But 
both sides of the identity are zero when k < 0, and the polynomial argument 
establishes (5.8) in all remaining cases. 

We can also derive (5.8) by adding together the two absorption identities 
(5.7) and (5.6): 


(r-k) 





+ r 


r- 1 
k- 1 


the left side is r(£), and we can divide through by r. This derivation is valid 
for everything but r = 0, and it’s easy to check that remaining case. 


(Well, not here 
anyway. ) 
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Those of us who tend not to discover such slick proofs, or who are oth- 
erwise into tedium, might prefer to derive (5.8) by a straightforward manip- 
ulation of the definition. If k > 0, 

/r-1\ /r-1\ = (t — 1)- (r-1)*=l 

v k ) \k-1J k! (k — 1 )! 

(r-l)^i(r-k) (r-l)^ik 
k! + k! 

(r — 1 )h^ir r- / r\ 

k! k! = ^kj ' 

Again, the cases for k ^ 0 are easy to handle. 

We’ve just seen three rather different proofs of the addition formula. This 
is not surprising; binomial coefficients have many useful properties, several of 
which are bound to lead to proofs of an identity at hand. 

The addition formula is essentially a recurrence for the numbers of Pas- 
cal’s triangle, so we’ll see that it is especially useful for proving other identities 
by induction. We can also get a new identity immediately by unfolding the 
recurrence. For example, 



Since ( 1 : ) = 0, that term disappears and we can stop. This method yields 
the general formula 



Notice that we don’t need the lower limit k ^ 0 on the index of summation, 
because the terms with k < 0 are zero. 

This formula expresses one binomial coefficient as the sum of others whose 
upper and lower indices stay the same distance apart. We found it by repeat- 
edly expanding the binomial coefficient with the smallest lower index: first 



160 BINOMIAL COEFFICIENTS 


(j), then (2)) then (^), then (q). What happens if we unfold the other way, 
repeatedly expanding the one with largest lower index? We get 



Now (°) is zero (so are (°) and Q), but these mate the identity nicer), and 
we can spot the general pattern: 



This identity, which we call summation on the upper index , expresses a 
binomial coefficient as the sum of others whose lower indices are constant. In 
this case the sum needs the lower limit k 0, because the terms with k < 0 
aren’t zero. Also, m and n. can’t in general be negative. 

Identity (5.10) has an interesting combinatorial interpretation. If we want 
to choose m + 1 tickets from a set of n + 1 tickets numbered 0 through n, 
there are ways to do this when the largest ticket selected is number k. 

We can prove both (5.9) and (5.10) by induction using the addition 
formula, but we can also prove them from each other. For example, let’s 
prove (5.9) from (5.10); our proof will illustrate some common binomial co- 
efficient manipulations. Our general plan will be to massage the left side 
)T ( r ^ k ) of (5.9) so that it looks like the left side Y_ (^) °f (5.10); then we’ll 
invoke that identity, replacing the sum by a single binomial coefficient; finally 
we’ll transform that coefficient into the right side of (5.9). 

We can assume for convenience that r and n are nonnegative integers; 
the general case of (5.9) follows from this special case, by the polynomial 
argument. Let’s write m instead of r, so that this variable looks more like 
a nonnegative integer. The plan can now be carried out systematically as 
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follows: 


z 

k^n 


(\ +k ) 





/ m + n + 1\ 
\ m+ 1 / 


( m+ ; +1 )- 


Let’s look at this derivation blow by blow. The key step is in the second line, 
where we apply the symmetry law (5.4) to replace ( m ^ k ) by ( m r ^ k ). We’re 
allowed to do this only when m + k j> 0, so our first step restricts the range 
of k by discarding the terms with k < — m. (This is legal because those terms 
are zero.) Now we’re almost ready to apply (5.10); the third line sets this up, 
replacing k by k — m and tidying up the range of summation. This step, like 
the first, merely plays around with Jj-notation. Now k appears by itself in 
the upper index and the limits of summation are in the proper form, so the 
fourth line applies (5.10). One more use of symmetry finishes the job. 

Certain sums that we did in Chapters 1 and 2 were actually special cases 
of (5.10), or disguised versions of this identity. For example, the case m = 1 
gives the sum of the nonnegative integers up through n: 




= 0+1 + • • • + n = 


(n + 1 )n 
2 



And the general case is equivalent to Chapter 2 ’s rule 

fu + 1 ) m+1 

k— = , integers m, n. ^ 0, 

m + 1 

O^ksCn 

if we divide both sides of this formula by m.!. In fact, the addition formula 
(5.8) tells us that 



if we replace r and k respectively by x + 1 and m. Hence the methods of 
Chapter 2 give us the handy indefinite summation formula 
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Binomial coefficients get their name from the binomial theorem , which 
deals with powers of the binomial expression x + y. Let’s look at the smallest 
cases of this theorem: 

(x + y)° = lx°y° 

(x + y) 1 = lxV + lxV 

(x + y) 2 = lx 2 y° + 2x’y ] + lx°y 2 

(x + y) 3 = lx 3 y° + 3x 2 y ] + Sx^ 2 + lx°y 3 

(x + y) 4 = lx 4 y° + 4x 3 y 1 + 6x 2 y 2 + 4x ] y 3 + 1 x°y 4 . 

It’s not hard to see why these coefficients are the same as the numbers in 

Pascal’s triangle: When we expand the product 


“At the age of 
twenty-one 
he [Moriarty] wrote 
a treatise upon the 
Binomial Theorem, 
which has had a Eu- 
ropean vogue. On 
the strength of it, 
he won the Math- 
ematical Chair at 
one of our smaller 
Universities.” 

— S. Holmes [84] 


n factors 

(x + y) n = (x + y )(x + y ) . . . (x + y ) , 


every term is itself the product of n factors, each either an x or y . The number 
of such terms with k factors of x and n — k factors of y is the coefficient 
of x k y n ~ k after we combine like terms. And this is exactly the number of 
ways to choose k of the n binomials from which an x will be contributed; that 
is, it’s (£). 

Some textbooks leave the quantity 0° undefined, because the functions 
x° and 0* have different limiting values when x decreases to 0. But this is a 
mistake. We must define 

x° = 1 , for all x, 


if the binomial theorem is to be valid when x = 0, y = 0, and/or x = — y. 
The theorem is too important to be arbitrarily restricted! By contrast, the 
function 0 X is quite unimportant. (See [220] for further discussion.) 

But what exactly is the binomial theorem? In its full glory it is the 
following identity: 


(x + y) r 



integer r ^ 0 
or |x/y| < 1. 


( 5 - 12 ) 


The sum is over all integers k; but it is really a finite sum when r is a nonneg- 
ative integer, because all terms are zero except those with 0 ^ k ^ r. On the 
other hand, the theorem is also valid when r is negative, or even when r is 
an arbitrary real or complex number. In such cases the sum really is infinite, 
and we must have |x/y| < 1 to guarantee the sum’s absolute convergence. 
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( Chapter 9 tells the 
meaning of O .) 


Two special cases of the binomial theorem are worth special attention, 
even though they are extremely simple. If x = y = 1 and r = n is nonnegative, 
we get 




integer n 0. 


This equation tells us that row n of Pascal’s triangle sums to 2 n . And when 
x is — 1 instead of +1 , we get 

° n = (o) - (?) + • • • + r (n) > inte S er n £ 0. 

For example, 1 — 4 + 6 — 4+1 = 0; the elements of row n sum to zero if we 
give them alternating signs, except in the top row (when n = 0 and 0° = 1). 

When r is not a nonnegative integer, we most often use the binomial 
theorem in the special case y = 1. Let’s state this special case explicitly, 
writing z instead of x to emphasize the fact that an arbitrary complex number 
can be involved here: 

(1+z) r = X(k) zk ’ l 2 ^ 1 - (5-i3) 


The general formula in (5.12) follows from this one if we set z = x/y and 
multiply both sides by y r . 

We have proved the binomial theorem only when r is a nonnegative in- 
teger, by using a combinatorial interpretation. We can’t deduce the general 
case from the nonnegative-integer case by using the polynomial argument, 
because the sum is infinite in the general case. But when r is arbitrary, we 
can use Taylor series and the theory of complex variables: 


f(z) = 


o T (V) 1 

-z H 


0 ! 1 ! 

Y_ f(k) (°).jc 

k>0 


2 ! 


k! 


The derivatives of the function f(z) = (1 + z) r are easily evaluated; in fact, 
f(k)( z ) =r b(i +z) r - k . Setting z = 0 gives (5.13). 

We also need to prove that the infinite sum converges, when |z| < 1 . It 
does, because (£) = 0(k _1_T ) by equation (5.83) below. 

Now let’s look more closely at the values of ( '() when n is a negative 
integer. One way to approach these values is to use the addition law (5.8) to 
fill in the entries that lie above the numbers in Table 155, thereby obtaining 
Table 164. For example, we must have = 1, since (°) = (T 1 ) + (“]) and 
(”]) = 0; then we must have (“j 1 ) = —1, since (°) = (^j 1 ) + ( q)', and so on. 
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Table 164 Pascal’s triangle, extended upward. 



-4 

1 

-4 

10 

-20 

35 

-56 

84 

-120 

165 

-220 

286 

-3 

1 

-3 

6 

-10 

15 

-21 

28 

-36 

45 

-55 

66 

-2 

1 

-2 

3 

-4 

5 

-6 

7 

-8 

9 

-10 

11 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


All these numbers are familiar. Indeed, the rows and columns of Ta- 
ble 164 appear as columns in Table 155 (but minus the minus signs). So 
there must be a connection between the values of (£) for negative n and the 
values for positive n. The general rule is 

^ , integer k; (5.14) 

it is easily proved, since 

r h = r(r — 1) . . . (r — k + 1) 

= (— 1 ) k ( — r) ( 1 — r) ... (k — 1 — r) = (-1 ) k (k - r - 1 


when k 0, and both sides are zero when k < 0. 

Identity (5.14) is particularly valuable because it holds without any re- 
striction. (Of course, the lower index must be an integer so that the binomial 
coefficients are defined.) The transformation in (5.14) is called negating the 
upper index, or “upper negation.” 

But how can we remember this important formula? The other identities 
we’ve seen — symmetry, absorption, addition, etc. — are pretty simple, but 
this one looks rather messy. Still, there’s a mnemonic that’s not too bad: To 
negate the upper index, we begin by writing down (— 1 ) k , where k is the lower 
index. (The lower index doesn’t change.) Then we immediately write k again, 
twice, in both lower and upper index positions. Then we negate the original 
upper index by subtracting it from the new upper index. And we complete 
the job by subtracting 1 more (always subtracting, not adding, because this 
is a negation process). 

Let’s negate the upper index twice in succession, for practice. We get 


You call this a 
mnemonic? I’d call 
it pneumatic — 
full of air. 

It does help me 
remember, though. 





(Now is a good 
time to do warmup 
exercise 4.) 
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It’s also frustrating, 
if we’re trying to 
get somewhere else. 


so we’re right back where we started. This is probably not what the framers of 
the identity intended; but it’s reassuring to know that we haven’t gone astray. 

Some applications of (5.14) are, of course, more useful than this. We can 
use upper negation, for example, to move quantities between upper and lower 
index positions. The identity has a symmetric formulation, 


(— 1 ' 


— n — 1 
m 


= (-11 


-m — 1 
n 


integers m, n ^ 0 , (5.15) 


which holds because both sides are equal to ( m T ^ n ). 

Upper negation can also be used to derive the following interesting sum: 



= (— ^ , integer m. 


(5-i6) 


The idea is to negate the upper index, then apply (5.9), and negate again: 


(Here double nega- 
tion helps, because 
we’ve sandwiched 
another operation in 
between.) 





(-in 



This formula gives us a partial sum of the rth row of Pascal’s triangle, provided 
that the entries of the row have been given alternating signs. For instance, if 
r = 5 and m = 2 the formula gives 1 — 5 + 1 0 = 6 = (— 1 ) 2 (^) . 

Notice that if m ^ r, (5.16) gives the alternating sum of the entire row, 
and this sum is zero when r is a positive integer. We proved this before, when 
we expanded (1 — l) r by the binomial theorem; it’s interesting to know that 
the partial sums of this expression can also be evaluated in closed form. 

How about the simpler partial sum, 



(5-i7) 


surely if we can evaluate the corresponding sum with alternating signs, we 
ought to be able to do this one? But no; there is no closed form for the partial 
sum of a row of Pascal’s triangle. We can do columns — that’s (5.10) — but 
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not rows. Curiously, however, there is a way to partially sum the row elements 
if they have been multiplied by their distance from the center: 

£ (k)Gr k ) = ^(m+l)' “ egerm - fe - l8) 

k<Cm x 7 x 7 

(This formula is easily verified by induction on m.) The relation between 
these partial sums with and without the factor of (r/2 — k) in the summand 
is analogous to the relation between the integrals 


xe 


dx = 


-±e- a 

2 e 


and 


dx . 


The apparently more complicated integral on the left, with the factor of x, 
has a closed form, while the simpler-looking integral on the right, without the 
factor, has none. Appearances can be deceiving. 

Near the end of this chapter, we’ll study a method by which it’s possible 
to determine whether or not there is a closed form for the partial sums of a 
given series involving binomial coefficients, in a fairly general setting. This 
method is capable of discovering identities (5.16) and (5.18), and it also will 
tell us that (5.17) is a dead end. 

Partial sums of the binomial series lead to a curious relationship of an- 
other kind: 


(Well, the right- 
hand integral is 

1 + erf a), 
a constant plus a 
multiple of the “er- 
ror function” of a, 
if we’re willing to 
accept that as a 
closed form.) 


Y_ H ( k r )(-x) k (x + yr-\ integer m.( 5 . 19) 

k$m ' ' k$m ' ' 


This identity isn’t hard to prove by induction: Both sides are zero when 
m. < 0 and 1 when m = 0. If we let S m stand for the sum on the left, we can 
apply the addition formula (5.8) and show easily that 


= L 


k<m 


m — 1 + r 
k 


x k y m-k + Y_ 

k<m 


m — 1 + r 
k- 1 


,,k m-k . 

K y j 


and 


L 

k<; m 

L 

k<; m 


m — 1 + r 
k 

m — 1 + r 
k- 1 


x k y m-k = ySmi + 
x k y m-k = xSm _, > 


m — 1 + r 
m 


when m > 0. Hence 

Sm = T yjSnx— 1 
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and this recurrence is satisfied also by the right-hand side of (5.19). By 
induction, both sides must be equal; QED. 

But there’s a neater proof. When r is an integer in the range 0 ]> r ]> — m, 
the binomial theorem tells us that both sides of (5.19) are (x + y ) m + r y- r . 
And since both sides are polynomials in r of degree m or less, agreement at 
m+1 different values is enough (but just barely!) to prove equality in general. 

It may seem foolish to have an identity where one sum equals another. 
Neither side is in closed form. But sometimes one side turns out to be easier 
to evaluate than the other. For example, if we set x = — 1 and y = 1 , we get 



integer m ^ 0, 


an alternative form of identity (5.16). And if we set x = y = 1 and r = m + 1 , 
we get 



(There’s a nice com- 
binatorial proof of 
this formula [247].) 


The left-hand side sums just half of the binomial coefficients with upper index 
2m + 1 , and these are equal to their counterparts in the other half because 
Pascal’s triangle has left-right symmetry. Hence the left-hand side is just 
l 2 2m+1 = 2 2m . This yields a formula that is quite unexpected, 

Y_ k )2~ k = 2 m , integer m 7 ? 0. (5.20) 

k^m ' ' 


Let’s check it when m = 2 : (0) + \ (1 ) + \ (,) = 1 + § + f = 4 . Astounding. 

So far we’ve been looking either at binomial coefficients by themselves or 
at sums of terms in which there’s only one binomial coefficient per term. But 
many of the challenging problems we face involve products of two or more 
binomial coefficients, so we’ll spend the rest of this section considering how 
to deal with such cases. 

Here’s a handy rule that often helps to simplify the product of two bino- 
mial coefficients: 

integers m, k. (5.21) 


r — k ' 
m — k) ’ 


We’ve already seen the special case k = 1 ; it’s the absorption identity (5.6). 
Although both sides of (5.21) are products of binomial coefficients, one side 
often is easier to sum because of interactions with the rest of a formula. For 
example, the left side uses m twice, the right side uses it only once. Therefore 
we usually want to replace (™) by (£) when summing on m. 
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Equation (5.21) holds primarily because of cancellation between mi’s in 
the factorial representations of ( n r J and (™). If all variables are integers and 
r )> m k ^ 0, we have 

f r \ ( m\ r! ml 

Vm/\ky ml (r — m)! k! (m — k)l 
r! 

k! (m-k)! (r-m)l 

r! (r — k) ! /r\ / r — k \ 

k! (t — k)! (m-k)l(r-m)! = VkJvm-kJ' 

That was easy. Furthermore, if m < k or k < 0 , both sides of (5.21) are 
zero; so the identity holds for all integers m and k. Finally, the polynomial 
argument extends its validity to all real r. 

A binomial coefficient (£) = r!/(r — k) ! k! can be written in the form 
(a + b)!/a! b! after a suitable renaming of variables. Similarly, the quantity 
in the middle of the derivation above, r!/k! (m — k)! (r — m)!, can be written 
in the form (a + b + c) ! /a! b! cl. This is a “trinomial coefficient,” which arises 
in the “trinomial theorem” : 


[x + y+z) n = 


(a + b + c)! a b c 
-x y 


0^a,b,c^n 
a+b + c=rt 




0^a,b,c^n 

a+b+c=n 


a! b! c! 

a + b + c 
b + c 


b + c 


x a yV . 


so am is really a trinomial coefficient in disguise. Trinomial coefficients 
pop up occasionally in applications, and we can conveniently write them as 

/a + b + c\ _ (a + b + c)! 

\ a, b, c J a! b! c! 

in order to emphasize the symmetry present. 

Binomial and trinomial coefficients generalize to multinomial coeffi- 
cients, which are always expressible as products of binomial coefficients: 

/a 1 + a2 + • • • + a m \ _ (ai + a2 + • • ■ + a m )! 

\ d-i i Q- 2 ) • • • ) dm J ai ! 02! . . . a m ! 

/a 1 + a2 + • • • + a m \ / a m _i + a m \ 

V d-2 H 1 - a m J " ' V a m ) ' 

Therefore, when we run across such a beastie, our standard techniques apply. 


Yeah, right. 


“Excogitavi autem 
olim mirabilem 
regulam pro nu- 
meris coefhcientibus 
potestatum, non 
tantum a binomio 
x + y , sed et a 
trinomio x + y + z, 
imo a polynomio 
quocunque, ut data 
potentia gradus 
cujuscunque v. 
gr. decimi, et 
potentia in ejus 
vaiore comprehensa, 
ut x 5 y 3 z 2 , possim 
statim assignare 
numerum coef- 
ficientem, quern 
habere debet, sine 
ulla Tabula jam 
calculata." 

— G.W. Leibniz [245] 
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Table 169 Sums of products of binomial coefficients. 


z 

k 

z 

k 

z 

k 

Z 


T ) S 

m + kj V n — k 


r + s 
m + n 


1 )( s 

m + k ) In + k 


Ids 

l — m + nl ’ 


l \ ( s + k 


m + k/ V n 


Mr = 


\l+m 


s — m 


n — l / ’ 


l-k 
m ) Vk — n 


M) k = (-1 


\l+m 


s — m — 1 
l — m — n 


integers m, n. (5.22) 


integer 1^0, 
integers m, n. 


(5-23) 


integer 1^0, 
integers m, n. 


(5.24) 


integers 
l, m, n ^ 0. 


(5-25) 



integers l, m ^ 0, 
integers n 


(5.26) 


Fold down the 
corner on this page, 
so you can find the 
table quickly later. 
You’ll need it! 


Now we come to Table 169, which lists identities that are among the most 
important of our standard techniques. These are the ones we rely on when 
struggling with a sum involving a product of two binomial coefficients. Each 
of these identities is a sum over k, with one appearance of k in each binomial 
coefficient; there also are four nearly independent parameters, called m, n, r, 
etc., one in each index position. Different cases arise depending on whether k 
appears in the upper or lower index, and on whether it appears with a plus or 
minus sign. Sometimes there’s an additional factor of (— 1) k , which is needed 
to make the terms summable in closed form. 

Table 169 is far too complicated to memorize in full; it is intended only 
for reference. But the first identity in this table is by far the most memorable, 
and it should be remembered. It states that the sum (over all integers k) of the 
product of two binomial coefficients, in which the upper indices are constant 
and the lower indices have a constant sum for all k, is the binomial coefficient 
obtained by summing both lower and upper indices. This identity is known 
as Vandermonde’s convolution, because Alexandre Vandermonde wrote a 
significant paper about it in the late 1700s [357]; it was, however, known 
to Chu Shih-Chieh in China as early as 1303. All of the other identities in 
Table 169 can be obtained from Vandermonde’s convolution by doing things 
like negating upper indices or applying the symmetry law, etc., with care; 
therefore Vandermonde’s convolution is the most basic of all. 

We can prove Vandermonde’s convolution by giving it a nice combinato- 
rial interpretation. If we replace k by k — m and nbyn-m, we can assume 
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that m = 0; hence the identity to be proved is 



integer n. 


( 5 - 27 ) 


Let r and s be nonnegative integers; the general case then follows by the 
polynomial argument. On the right side, ( r + s ) is the number of ways to 
choose n people from among r men and s women. On the left, each term 
of the sum is the number of ways to choose k of the men and n — k of the 
women. Summing over all k counts each possibility exactly once. 

Much more often than not we use these identities left to right, since that’s 
the direction of simplification. But every once in a while it pays to go the 
other direction, temporarily making an expression more complicated. When 
this works, we’ve usually created a double sum for which we can interchange 
the order of summation and then simplify. 

Before moving on let’s look at proofs for two more of the identities in 
Table 169. It’s easy to prove (5.23); all we need to do is replace the first 
binomial coefficient by ( l _J l _ lc ), then Vandermonde’s (5.22) applies. 

The next one, (5.24), is a bit more difficult. We can reduce it to Van- 
dermonde’s convolution by a sequence of transformations, but we can just 
as easily prove it by resorting to the old reliable technique of mathematical 
induction. Induction is often the first thing to try when nothing else obvious 
jumps out at us, and induction on l works just fine here. 

For the basis 1 = 0, all terms are zero except when k = — m; so both sides 
of the equation are (— 1) m ( s ~ n m )- Now suppose that the identity holds for all 
values less than some fixed l, where l > 0. We can use the addition formula 
to replace ( m + k ) by + (m+k-i)’ the or igi na l sum now breaks into two 

sums, each of which can be evaluated by the induction hypothesis: 



1-1 

m + k — 



(~1) k 


= (-1 


1 +m 


s — m 
n — l + 1 




/s — m + 1\ 

\ n — l + 1 J 


And this simplifies to the right-hand side of (5.24), if we apply the addition 
formula once again. 

Two things about this derivation are worthy of note. First, we see again 
the great convenience of summing over all integers k, not just over a certain 
range, because there’s no need to fuss over boundary conditions. Second, 
the addition formula works nicely with mathematical induction, because it’s 
a recurrence for binomial coefficients. A binomial coefficient whose upper 
index is l is expressed in terms of two whose upper indices are l — 1 , and 
that’s exactly what we need to apply the induction hypothesis. 


Sexist! You men- 
tioned men first. 
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So much for Table 169. What about sums with three or more binomial 
coefficients? If the index of summation is spread over all the coefficients, our 
chances of finding a closed form aren’t great: Only a few closed forms are 
known for sums of this kind, hence the sum we need might not match the 
given specs. One of these rarities, proved in exercise 43, is 



Here’s another, more symmetric example: 


( 5 - 28 ) 



(a + b + c)! 
a! b! c! 


integers a, b, c 0. 


This one has a two-coefficient counterpart, 


( 5 - 29 ) 


V f a + b V b + a Y-n k = (a + b)! 

\a + ky \b + ky J alb! 


integers a, b ^ 0, (5.30) 


which incidentally doesn’t appear in Table 169. The analogous four-coefficient 
sum doesn’t have a closed form, but a similar sum does: 


/a + b\ /b + c\ /c + d\ /d + a\ I /2a + 2b + 2c + 2d\ 

U + V \ b + V Vc + V U + V/ U + b + c + d + kJ 
(a+b + c + d)! (a+b+c)! (a+b + d)! (a+c + d)! (b + c + d)! 
(2a+2b+2c+2d)! (a+c)l (b + d)l a! b! c! d! 

integers a, b, c, d ^ 0. 


This was discovered by John Dougall [82] early in the twentieth century. 

Is Dougall’s identity the hairiest sum of binomial coefficients known? No! 
The champion so far is 


kij 



d] + a n N 


( a l + ' ' ’ + a n\ 

\ai , a 2 , . . . , a n ) 


integers ai , a 2 , . . . , a n ^ 0. 


(5-3i) 


Here the sum is over ( n 2 ') index variables kp for 1 ^ i < j < n. Equation 
(5.29) is the special case n = 3; the case n = 4 can be written out as follows, 
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if we use (a,b,c, d) for (ai , Q2, 03, a 4 ) and (i, j,k) for (T<i 2 , 3 , 1^23 


id A 




(a + b + c + d)! 
a! b! c! d! 


/b + c\ / a+d \ / b + d \ 
Vc+ky Vd— i— j / Vd+i-V 

integers a, b, c, d ^ 0. 


c + d \ 
d+j +ky 


The left side of (5.31) is the coefficient of after the product of 

n(n — 1 ) fractions 


n 

1 ^i,j$n 

iAS 


1 -* 


has been fully expanded into positive and negative powers of the z’s. The 
right side of (5.31) was conjectured by Freeman Dyson in 1962 and proved by 
several people shortly thereafter. Exercise 89 gives a “simple” proof of (5.31). 
Another noteworthy identity involving lots of binomial coefficients is 


£j-1) i+k 

iA 



= (-V 


1 (n + r 
n + 1 


G)( s+ ::r) 

s — r \ 

, , integers l, ra, n; 

m — u - 1 ) 


n^O. (5.32) 


This one, proved in exercise 83 , even has a chance of arising in practical 
applications. But we’re getting far afield from our theme of “basic identities,” 
so we had better stop and take stock of what we’ve learned. 

We’ve seen that binomial coefficients satisfy an almost bewildering va- 
riety of identities. Some of these, fortunately, are easily remembered, and 
we can use the memorable ones to derive most of the others in a few steps. 
Table 174 collects ten of the most useful formulas, all in one place; these are 
the best identities to know. 


5.2 BASIC PRACTICE 

In the previous section we derived a bunch of identities by manipu- 
lating sums and plugging in other identities. It wasn’t too tough to find those 
derivations — we knew what we were trying to prove, so we could formulate 
a general plan and fill in the details without much trouble. Usually, however, 
out in the real world, we’re not faced with an identity to prove; we’re faced 
with a sum to simplify. And we don’t know what a simplified form might 
look like (or even if one exists). By tackling many such sums in this section 
and the next, we will hone our binomial coefficient tools. 
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To start, let’s try our hand at a few sums involving a single binomial 
coefficient. 


Algorithm 

self-teach : 

1 read problem 

2 attempt solution 

3 skim book solu- 

tion 

4 if attempt failed 

goto 1 

else goto next 
problem 


Unfortunately, 
that algorithm 
can put you in an 
infinite loop. 

Suggested patches: 

0 set c <— 0 
3a set c <— c + 1 
3b if c = N 

goto your TA 


Problem 1 : A sum of ratios. 

We’d like to have a closed form for 



integers n j> m j> 0. 


At first glance this sum evokes panic, because we haven’t seen any identi- 
ties that deal with a quotient of binomial coefficients. (Furthermore the sum 
involves two binomial coefficients, which seems to contradict the sentence 
preceding this problem.) However, just as we can use the factorial represen- 
tations to reexpress a product of binomial coefficients as another product — 
that’s how we got identity (5.21) — we can do likewise with a quotient. In 
fact we can avoid the grubby factorial representations by letting r = n and 
dividing both sides of equation (5.21) by (£)( n ); this yields 

(“)/G) = (r- k 0/O- 


So we replace the quotient on the left, which appears in our sum, by the one 
on the right; the sum becomes 




— E. W. Dijkstra 


We still have a quotient, but the binomial coefficient in the denominator 
doesn’t involve the index of summation k, so we can remove it from the sum. 
We’ll restore it later. 

We can also simplify the boundary conditions by summing over all k ^ 0 ; 
the terms for k > m are zero. The sum that’s left isn’t so intimidating: 



It’s similar to the one in identity (5.9), because the index k appears twice 
with the same sign. But here it’s — k and in (5.9) it’s not. The next step 
should therefore be obvious; there’s only one reasonable thing to do: 

v- = y- / n — (m — k) \ 

— k) Im— (m — k)/ 

k^O x 7 m-k^O v 


. . . But this sub- 
chapter is called 
BASIC practice. 
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Table 174 The top ten binomial coefficient identities. 



n! integers 

k! (n — k)! ’ u + k :> 0. 


n 
n — k 

r /r— 1 
kVk-1 

r- 1 
k 


r- 1 
k- 1 


Mr 


k - r - 1 
k 

r — k 


k J \m — k 
x + p) T , 


r + n + 1 
n 

n+1 

m + 1 

r + s 
n 


integer n + 0, 
integer k. 

integer k ^ 0. 

integer k. 
, integer k. 

integers m, k. 

integer r ^ 0, 
or \x/y\ < 1. 

integer n. 

integers 
m, n + 0. 

integer n. 


factorial expansion 


symmetry 


absorption / extraction 


addition/induction 


upper negation 


trinomial revision 


binomial theorem 


parallel summation 


upper summation 


Vandermonde convolution 


And now we can apply the parallel summation identity, (5.9): 


L 

k<m 


u — m + k 


(n — m) + m + 1 
m 


n+1 

m 


Finally we reinstate the (™) in the denominator that we removed from 
the sum earlier, and then apply (5.7) to get the desired closed form: 


n+1 

m 



n+1 

n + 1 — m 


This derivation actually works for any real value of n, as long as no division 
by zero occurs; that is, as long as n isn’t one of the integers 0, 1 , . . . , m — 1 . 
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Please, don ’t re- 
mind me of the 
midterm. 


The more complicated the derivation, the more important it is to check 
the answer. This one wasn’t too complicated but we’ll check anyway. In the 
small case m = 2 and n = 4we have 



yes, this agrees perfectly with our closed form (4 + 1)/(4 + 1 — 2). 

Problem 2: From the literature of sorting. 

Our next sum appeared way back in ancient times (the early 1970s) 
before people were fluent with binomial coefficients. A paper that introduced 
an improved merging technique [196] concludes with the following remarks: 
“It can be shown that the expected number of saved transfers ... is given by 
the expression 

m— r — 1 C m _ n _i 


Here m and n. are as defined above, and m C n is the symbol for the number 
of combinations of m objects taken n at a time. . . . The author is grateful to 
the referee for reducing a more complex equation for expected transfers saved 
to the form given here.” 

We’ll see that this is definitely not a final answer to the author’s problem. 
It’s not even a midterm answer. 

First we should translate the sum into something we can work with; the 
ghastly notation m _ T _i C m _ n _i is enough to stop anybody, save the enthu- 
siastic referee (please). In our language we’d write 




m — k — 1 
m — n — 1 



integers m. > n 0. 


The binomial coefficient in the denominator doesn’t involve the index of sum- 
mation, so we can remove it and work with the new sum 



What next? The index of summation appears in the upper index of the 
binomial coefficient but not in the lower index. So if the other k weren’t there, 
we could massage the sum and apply summation on the upper index (5.10). 
With the extra k, though, we can’t. If we could somehow absorb that k into 
the binomial coefficient, using one of our absorption identities, we could then 
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sum on the upper index. Unfortunately those identities don’t work here. But 
if the k were instead m — k, we could use absorption identity (5.6): 


(m — k) 


m — k — 1 
m — n — 1 


= ( m — n 


m — k 
m — n 


So here’s the key: We’ll rewrite k as m — (m. — k) and split the sum S 
into two sums: 


£ k( ra ~ k- ' 

z — V m — n— I 


k=0 


= y_ ( m_ k ! 

z — v 7 \m — n — 1 

k=0 v 


= Z 


m-k- 1 \ v- /m-k- 1 

m| I - 2_ (m. — k) 

k=0 x ' k=0 


V - / tu — k — 1 \ V— 

- m 2_( J-Z 


m — n 

v m — n— 1/ — \m - n 

k— 0 x 7 k— 0 


m — n — 1 
m — k 


= mA — (m — n)B , 


where 


A = 



m — k — 1 
m — n — 1 



The sums A and B that remain are none other than our old friends in 
which the upper index varies while the lower index stays fixed. Let’s do B 
first, because it looks simpler. A little bit of massaging is enough to make the 
summand match the left side of (5.10): 




In the last step we’ve included the terms with 0 ^k<m — nin the sum; 
they’re all zero, because the upper index is less than the lower. Now we sum 
on the upper index, using (5.10), and get 
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The other sum A is the same, but with m replaced by m — 1 . Hence we 
have a closed form for the given sum S, which can be further simplified: 


S = m A — ( m — n ) B 


m 


m 
m — n 


- |m-n) 


m + 1 
m — n + 1 


— (m 


n) 


m + 1 \ 

m — n + 1 ) 



n 


m — n + 1 



And this gives us a closed form for the original sum: 


Do old exams 
ever die? 



n 

m — n + 1 


Even the referee can’t simplify this. 

Again we use a small case to check the answer. When m = 4 and n = 2, 
we have 


T = 0-(?)/0) +!•(?)/©+ 2- (!)/G) -0+| + | 


2 

3 ’ 


which agrees with our formula 2/(4 — 2 + 1 ). 

Problem 3: From an old exam. 

Let’s do one more sum that involves a single binomial coefficient. This 
one, unlike the last, originated in the halls of academia; it was a problem on 
a take-home test. We want the value of Qiooooooi when 



integer n ^ 0. 


This one’s harder than the others; we can’t apply any of the identities we’ve 
seen so far. And we’re faced with a sum of 2 1 000000 terms, so we can’t just 
add them up. The index of summation k appears in both indices, upper and 
lower, but with opposite signs. Negating the upper index doesn’t help, either; 
it removes the factor of (—1 ) k , but it introduces a 2k in the upper index. 

When nothing obvious works, we know that it’s best to look at small 
cases. If we can’t spot a pattern and prove it by induction, at least we’ll have 
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some data for checking our results. Here are the nonzero terms and their sums 
for the first four values of n. 


n 

Qn 

0 

Co) 


= 1 

= 1 

1 

(0)- 

(1) 

= 1-1 

= 0 

2 

(0)- 

(?) + 

= 1 - 3+1 

= -l 

3 

(0)- 

(D + (D - 

© + 0 = 1 - 7 + 15 - 

10 + 1 = 0 


We’d better not try the next case, n = 4 ; the chances of making an arithmetic 
error are too high. (Computing terms like ( ' 4 2 j and ( (J ) by hand, let alone 
combining them with the others, is worthwhile only if we’re desperate.) 

So the pattern starts out 1 , 0 , — 1 , 0 . Even if we knew the next term or 
two, the closed form wouldn’t be obvious. But if we could find and prove a 
recurrence for Q n we’d probably be able to guess and prove its closed form. 
To find a recurrence, we need to relate Q n to Q n _i (or to Q sma iier values); but 
to do this we need to relate a term like (©T©’ which arises when n = 7 and 
k = 13 , to terms like ( 64 13 13 )- This doesn’t look promising; we don’t know 
any neat relations between entries in Pascal’s triangle that are 64 rows apart. 
The addition formula, our main tool for induction proofs, only relates entries 
that are one row apart. 

But this leads us to a key observation: There’s no need to deal with 
entries that are 2 n_1 rows apart. The variable n never appears by itself, it’s 
always in the context 2 n . So the 2 n is a red herring! If we replace 2 n by m, 
all we need to do is find a closed form for the more general (but easier) sum 



integer m 0; 


Oh, the sneakiness 
of the instructor 
who set that exam. 


then we’ll also have a closed form for Q n = R2n. And there’s a good chance 
that the addition formula will give us a recurrence for the sequence R m . 

Values of R m for small m can be read from Table 155 , if we alternately 
add and subtract values that appear in a southwest-to-northeast diagonal. 
The results are: 


m 

0 12 3 

4 

5 

6 

7 8 9 

10 

Rm 

110-1 

-1 

0 

1 

1 0 -1 

-1 


There seems to be a lot of cancellation going on. 

Let’s look now at the formula for R m and see if it defines a recurrence. 
Our strategy is to apply the addition formula (5.8) and to find sums that 
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Anyway those of 
us who’ve done 
warmup exercise 4 
know it. 


have the form R^ in the resulting expression, somewhat as we did in the 
perturbation method of Chapter 2: 



= Rm-1 + M) 2m - Rm-2 - (-I) 2 '™- 1 ’ = R m -1 - R m -2- 


(In the next-to-last step we’ve used the formula ( m l ) = (— l) m , which we 
know is true when m 0.) This derivation is valid for m ^ 2. 

Prom this recurrence we can generate values of R m quickly, and we soon 
perceive that the sequence is periodic. Indeed, 


R 


m 


\\ 


lo 


if m mod 6 


( 0 
1 
2 

3 

4 
^ 5 


The proof by induction is by inspection. Or, if we must give a more academic 
proof, we can unfold the recurrence one step to obtain 


Rm — (Rm— 2 Rm— 3 ] Rm— 2 — Rm— 3 > 


whenever m ^ 3. Hence R m = R m -6 whenever m ^ 6. 

Finally, since Q n = R 2 11 , we can determine Q n by determining 2 n mod 6 
and using the closed form for R m . When n = 0we have 2° mod 6 = 1; after 
that we keep multiplying by 2 (mod 6), so the pattern 2, 4 repeats. Thus 

( Ri = 1 , if n = 0; 

Qn = R2 tv = < R 2 = 0, if n is odd; 

v. R 4 = — 1 , if n > 0 is even. 

This closed form for Q n agrees with the first four values we calculated when 

we started on the problem. We conclude that Q 1000000 = R 4 = — 1. 
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Problem 4 : A sum involving two binomial coefficients. 
Our next task is to find a closed form for 



m — k — 1 
m — n — 1 


integers m > n 0. 


Wait a minute. Where’s the second binomial coefficient promised in the title 
of this problem? And why should we try to simplify a sum we’ve already 
simplified? (This is the sum S from Problem 2 .) 

Well, this is a sum that’s easier to simplify if we view the summand 
as a product of two binomial coefficients, and then use one of the general 
identities found in Table 169 . The second binomial coefficient materializes 
when we rewrite k as ('(): 



And identity (5.26) is the one to apply, since its index of summation appears 
in both upper indices and with opposite signs. 

But our sum isn’t quite in the correct form yet. The upper limit of 
summation should be m— 1 , if we’re to have a perfect match with (5.26). No 
problem; the terms for u < k ^ m -1 are zero. So we can plug in, with 
(l, m, n, q) <— (m— 1,m — n— 1,1,0); the answer is 


S = 


m 

m — n + 1 


This is cleaner than the formula we got before, 
previous formula by using (5.7): 


( m ) 

n 

( m ) 

V m — n + 1 ) 

m — n + 1 

\m — nj 


We can convert it to the 


Similarly, we can get interesting results by plugging special values into 
the other general identities we’ve seen. Suppose, for example, that we set 
m = n = 1 and q = 0 in (5.26). Then the identity reads 


Y_ d-k)k 

0^k$l 



The left side is l((l+ 1 )l/ 2 ) — (I 2 + 2 2 + ■ • - + 1 2 ), so this gives us a brand new 
way to solve the sum-of-squares problem that we beat to death in Chapter 2 . 

The moral of this story is: Special cases of very general sums are some- 
times best handled in the general form. When learning general forms, it’s 
wise to learn their simple specializations. 
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So we should 
deep six this sum, 
right? 


Problem 5: A sum with three factors. 

Here’s another sum that isn’t too bad. We wish to simplify 

2l(k)(k) k ’ integer n 5 > 0. 

The index of summation k appears in both lower indices and with the same 
sign; therefore identity (5.23) in Table 169 looks close to what we need. With 
a bit of manipulation, we should be able to use it. 

The biggest difference between (5.23) and what we have is the extra k in 
our sum. But we can absorb k into one of the binomial coefficients by using 
one of the absorption identities: 



We don’t care that the s appears when the k disappears, because it’s constant. 
And now we’re ready to apply the identity and get the closed form, 



If we had chosen in the first step to absorb k into (£), not (£), we wouldn’t 
have been allowed to apply (5.23) directly, because n— 1 might be negative; 
the identity requires a nonnegative value in at least one of the upper indices. 


Problem 6: A sum with menacing characteristics. 

The next sum is more challenging. We seek a closed form for 


y /n + k\ /2k\ (-1) k 

vVA 2k Akjk+l 


integer n ^ 0. 


One useful measure of a sum’s difficulty is the number of times the index of 
summation appears. By this measure we’re in deep trouble — k appears six 
times. Furthermore, the key step that worked in the previous problem — to 
absorb something outside the binomial coefficients into one of them — won’t 
work here. If we absorb the k + 1 we just get another occurrence of k in its 
place. And not only that: Our index k is twice shackled with the coefficient 2 
inside a binomial coefficient. Multiplicative constants are usually harder to 
remove than additive constants. 



182 BINOMIAL COEFFICIENTS 


We’re lucky this time, though. The 2 k’s are right where we need them 
for identity (5.21) to apply, so we get 

y fn + k' \ /2k\ t-1) k = y- (n + k\ /n\ t~1) k 
l 2k ylkyk+1 2 — \ k M k 7 k + 1 ' 

The two 2 ’s disappear, and so does one occurrence of k. So that’s one down 
and five to go. 

The k + 1 in the denominator is the most troublesome characteristic left, 
and now we can absorb it into ( k ) using identity (5.6): 





(Recall that n 0 .) Two down, four to go. 

To eliminate another k we have two promising options. We could use 
symmetry on ( n k k ); or we could negate the upper index n + k, thereby 
eliminating that k as well as the factor (—1 ) k . Let’s explore both possibilities, 
starting with the symmetry option: 


1 

n+1 



1 

n+1 



Third down, three to go, and we’re in position to make a big gain by plugging 
into (5.24): Replacing (l, m, n, s) by (n + 1 , 1 ,n,n), we get 


1 

n+1 



1 

n+1 


(-1) n 


n — 1 

-1 


= 0. 


For a minute 
I thought we’d 
have to punt. 


Zero, eh? After all that work? Let’s check it when n = 2 : (q) (q) j — (,) (^) \ + 
(4) (2) I = ^ — f + f = 0- It checks. 

Just for the heck of it, let’s explore our other option, negating the upper 
index of ( n k k ): 


1 

n+1 



1 

n+1 



Now (5.23) applies, with (l, m, n, s) <— (n + 1 , 1 , 0 , — n — 1 ), and 


1 

n+1 



1 

n+1 
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Try binary search: 
Replay the middle 
formula first, to see 
if the mistake was 
early or late. 


Hey wait. This is zero when n > 0, but it’s 1 when n = 0. Our other 
path to the solution told us that the sum was zero in all cases! What gives? 
The sum actually does turn out to be 1 when n = 0, so the correct answer is 
‘[n = 0]’. We must have made a mistake in the previous derivation. 

Let’s do an instant replay on that derivation when n = 0, in order to see 
where the discrepancy first arises. Ah yes; we fell into the old trap mentioned 
earlier: We tried to apply symmetry when the upper index could be negative! 
We were not justified in replacing ( n ^ k ) by ( n + k ) when k ranges over all 
integers, because this converts zero into a nonzero value when k < — n. (Sorry 
about that.) 

The other factor in the sum, (^]), turns out to be zero when k < — n, 
except when n = 0 and k = — 1. Hence our error didn’t show up when we 
checked the case n = 2. Exercise 6 explains what we should have done. 


Problem 7: A new obstacle. 

This one’s even tougher; we want a closed form for 



M) k 

k + 1 


integers m, n > 0 . 


If m were 0 we’d have the sum from the problem we just finished. But it’s 
not, and we’re left with a real mess — nothing we used in Problem 6 works 
here. (Especially not the crucial first step.) 

However, if we could somehow get rid of the m, we could use the result 
just derived. So our strategy is: Replace (J) ^ k J by a sum of terms like ( l ^) 
for some nonnegative integer l; the summand will then look like the summand 
in Problem 6 , and we can interchange the order of summation. 

What should we substitute for (.^i'^iJ? A painstaking examination of the 
identities derived earlier in this chapter turns up only one suitable candidate, 
namely equation ( 5 . 26 ) in Table 169. And one way to use it is to replace the 
parameters (l, m,n, q,k) by (n + k — 1 , 2 k, m — 1 , 0 , j), respectively: 



' 2 k\ (-p k 
k) k+ 1 


= L L 

k^O OsCjsCn+k-1 


/ n + lC_1_j\ / 

^ 2 k )\m 


2k\ (-1) k 
kj k+ 1 


= L 


j^o 


m — 1 


L 


k^j-n+1 

k^O 


/n + k -1 -j\ / 2 k\ (- 1 ) k 
V 2 k ) \k) k +1 ’ 


In the last step we’ve changed the order of summation, manipulating the 
conditions below the ’s according to the rules of Chapter 2. 
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We can’t quite replace the inner sum using the result of Problem 6, 
because it has the extra condition k 5; j — n + 1 . But this extra condition 
is superfluous unless j — n + 1 >0; that is, unless j J; n. And when j n, 
the first binomial coefficient of the inner sum is zero, because its upper index 
is between 0 and k — 1 , thus strictly less than the lower index 2k. We may 
therefore place the additional restriction j < n on the outer sum, without 
affecting which nonzero terms are included. This makes the restriction k 
j — n + 1 superfluous, and we can use the result of Problem 6. The double 
sum now comes tumbling down: 


z 

j^o 


m — 1 


z 


k^j-n+1 

k^O 


n + k-1 -j\ [2k\ (-l' k 
2k 


= z 

0^j<n 

= z 

0^j<n 


1 

m — 

j 

m — 1 


, L 


k>0 


k / k+ 1 

n + k-1 -j\ /2k\ (-1) k 
2k Ak/k+1 


[n - 1 - j = 0] = 


n — 1 
m — 1 


The inner sums vanish except when j = n — 1 , so we get a simple closed form 
as our answer. 


Problem 8: A different obstacle. 

Let’s branch out from Problem 6 in another way by considering the sum 



integers m, n Js 0. 


Again, when m = 0 we have the sum we did before; but now the m occurs 
in a different place. This problem is a bit harder yet than Problem 7, but 
(fortunately) we’re getting better at finding solutions. We can begin as in 
Problem 6, 


v ( n + 

A-A k jlA+l+m' 


Now (as in Problem 7) we try to expand the part that depends on m into 
terms that we know how to deal with. When m was zero, we absorbed k + 1 
into (£) ; if m > 0, we can do the same thing if we expand 1 /(k + 1 + m) into 
absorbable terms. And our luck still holds: We proved a suitable identity 



r+1 


i 


r + 1 — m ’ 


integer m 0, 
r^{0,1,...,m— 1}. 


(5-33) 
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in Problem 1 . Replacing r by — k — 2 gives the desired expansion, 


S 


m 




Now the (k + I) -1 can be absorbed into (™), as planned. In fact, it could 
also be absorbed into ( _k . _2 ) _1 . Double absorption suggests that even more 
cancellation might be possible behind the scenes. Yes — expanding everything 
in our new summand into factorials and going back to binomial coefficients 
gives a formula that we can sum on k: 


They expect us to 
check this 
on a sheet of 
scratch paper. 


S m = 


m! n! 


(m + n + 1)! f 


XH ) 1 




m! n! 


(m + n+1)! ^ 


£(-i) J 




m + n + 1 
n + 1 + j 

m + n + 1 
n+ 1 + j 


L 


n + 1 + j 
k + j + 1 


-n- 1 
k 


The sum over all integers j is zero, by (5.24). Hence —S m is the sum for j < 0 . 
To evaluate — S m for j < 0 , let’s replace j by — k — 1 and sum for k (> 0 : 


S 


m 


V l-D k f m + n+1 ) 

(m + n + 1)! ko V n-k )\ n ) 

^ !n! v i-n n -+ m + n + 1 V k_n_1 

(m + n+1)! >fe V k )\ n 

V(-n k f m + n + 1 V 2n “M 

(m + n + 1)! V k J\ n J 

m!n! y- i-i^fm + n + l \/2n-k\ 

(m + n + 1 )! k tln V k A n J' 


Finally (5.25) applies, and we have our answer: 


S m = (-K 


m! n! 


m 


= (— 1 ) n m-m- 


— n— 1 


(m + n+1)! \n y 
Whew; we’d better check it. When n = 2 we find 

1 6.6 m(m — 1 ( 

^ m. — 


m + 1 m + 2 m + 3 


(m + l)(m + 2 )(m + 3 ) 


Our derivation requires m to be an integer, but the result holds for all real m, 
because the quantity (m + 1 ) n+1 S m is a polynomial in m of degree + n. 
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5.3 TRICKS OF THE TRADE 

Let’s look next at three techniques that significantly amplify the 
methods we have already learned. 

Trick 1: Going halves. 

Many of our identities involve an arbitrary real number r. When r has 
the special form “integer minus one half,” the binomial coefficient (£) can be 
written as a quite different-looking product of binomial coefficients. This leads 
to a new family of identities that can be manipulated with surprising ease. 
One way to see how this works is to begin with the duplication formula 

r- (r — j)- = (2r)— /2 2k , integer k > 0. (5.34) 

This identity is obvious if we expand the falling powers and interleave the 
factors on the left side: 


r (r — 2 ) (r — 1 ) (r — | ) . . . (r — k + 1 ) (r — k +- j ) 

(2r)(2r-1)...(2r-2k+l) 
2 - 2-. ..-2 

Now we can divide both sides by k! 2 , and we get 


r - 1 /2 
k 


2 r\ /2k' 

2 k i l k 


2n 

n 


2 2k , integer k. (5.35) 

If we set k = r = n, where n is an integer, this yields 

^2 2n , integer n. (5.36) 

And negating the upper index gives yet another useful formula, 

integer n. (5.37) 

For example, when n = 4we have 


n- 1/2 
n 


— 1/2 

TL 


-1 

~4~ 


2 n 
n 


- 1/2 

4 


-1/2) (-3/2) (-5/2) (-7/2) 
4! 

zlY 1-3-5-7 
2 J 1 - 2 - 3-4 

-1 \ 4 1 -3-5-7-2-4-6-8 
T j 1 -2-3-4- 1 -2-3-4 


-1 


This should really 
be called Trick 1 /2. 


... we halve . . . 


Notice how we’ve changed a product of odd numbers into a factorial. 
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Identity ( 5 . 35 ) has an amusing corollary. Let r = in, and take the sum 
over all integers k. The result is 



n — 1/2 

LV2J 


integer n 0 


(5-38) 


by (5.23), because either n/2 or (n — l)/2 is (n/2J, a nonnegative integer! 
We can also use Vandermonde’s convolution (5.27) to deduce that 



integer n ^ 0. 


Plugging in the values from ( 5 . 37 ) gives 



/—I \ k /2k\ / —1 \ n ~ k / 2(rt — k)\ 

vw U; vw v n-k ) 

(-1 ) n /2k\ / 2 n - 2k\ _ 

4 n Vk J V n-k ) ’ 


this is what sums to (— l) n . Hence we have a remarkable property of the 
“middle” elements of Pascal’s triangle: 

v- /2k\ /2n - 2k\ „ . . 

n-k ) = 4 ’ mteger n ^°- (5-39) 

For example, Q (f) + (?) Q + Q) (i) + (I) (0) = 1 -20+2-6+6-2+20-1 = 64 = 4 3 . 

These illustrations of our first trick indicate that it’s wise to try changing 
binomial coefficients of the form ( k ) into binomial coefficients of the form 
( n ~ k / 2 ) , where n is some appropriate integer (usually 0, 1 , or k); the resulting 
formula might be much simpler. 

Trick 2: High-order differences. 

We saw earlier that it’s possible to evaluate partial sums of the series 
(k) 1 ) k , but not of the series ( k ) . It turns out that there are many important 

applications of binomial coefficients with alternating signs, ( k ) (—1 ) k . One of 
the reasons for this is that such coefficients are intimately associated with the 
difference operator A defined in Section 2.6. 

The difference Af of a function f at the point x is 


Af(x) = f(x + 1) — f(x) ; 
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if we apply A again, we get the second difference 

A 2 f(x) = Af(x+1)-Af(x) = (f (x+2) — f (x+1 )) — (f (x+1 ) — f (x)) 

= f (x + 2) — 2f (x + 1 ) + f (x) , 


which is analogous to the second derivative. Similarly, we have 

A 3 f(x) = f(x T 3) — 3f(x T 2) T 3f (x T 1 ) — f(x) ; 

A 4 f (x) = f (x + 4) - 4f (x + 3) + 6f (x + 2) - 4f (x + 1 ) + f (x) ; 

and so on. Binomial coefficients enter these formulas with alternating signs. 
In general, the nth difference is 

A n f(x) = (f)(-1) n - k f(x + k), integer n^O. (5.40) 

k ' ' 

This formula is easily proved by induction, but there’s also a nice way to prove 
it directly using the elementary theory of operators. Recall that Section 2.6 
defines the shift operator E by the rule 

Ef(x) = f(x+ 1) ; 


hence the operator A is E — 1 , where 1 is the identity operator defined by the 
rule lf(x) = f(x). By the binomial theorem, 


A n = (E — 1 ) n = Y_ ( jE k (-1) n ~ k 

k k 2 


This is an equation whose elements are operators; it is equivalent to (5.40), 
since E k is the operator that takes f (x) into f (x + k). 

An interesting and important case arises when we consider negative 
falling powers. Let f(x) = (x — 1)— = 1/x. Then, by rule (2.45), we have 
Af (x) = (— 1 )(x — 1 , A 2 f(x) = (— 1 )(— 2) (x — 1 , and in general 


A n ((x — 1 )— ) = (-1 P(x-V 


-n — 1 


= (-H 


n! 


x(x + 1)...(x + n) 


Equation (5.40) now tells us that 



H) k 

x + k 


n! 


x(x+ 1) . . . (x + n) 

-1 


1 /x + n 


n 


x ^ {0, — 1 , . . . , — n}. 


(5-4i) 
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For example, 

1 4,6 4 , 1 

x ~ x+ 1 + x + 2 ~ x + 3 + x + 4 

= 4! = , / /x + 4\ 

x(x + l)(x + 2)(x + 3)(x + 4) / \ 4 )■ 

The sum in (5.41) is the partial fraction expansion of n!/(x(x+ 1 ) . . . (x + n)). 

Significant results can be obtained from positive falling powers too. If 
f(x) is a polynomial of degree d, the difference Af (x) is a polynomial of degree 
d— 1 ; therefore A d f(x) is a constant, and A n f(x) = 0 if n > d. This extremely 
important fact simplifies many formulas. 

A closer look gives further information: Let 

f (x) = a d x d + a d -ix d_1 H h ai x 1 + a 0 x° 


be any polynomial of degree d. We will see in Chapter 6 that we can express 
ordinary powers as sums of falling powers (for example, x 2 = x- + x-)', hence 
there are coefficients b d , b d _j , . . . , bi , bo such that 

f (x) = b d x4 + b d _ix-4rd_ -f + b]x4 + b 0 x-. 


(It turns out that b d = a d and bo = ao, but the intervening coefficients are 
related in a more complicated way.) Let c k = k! b k for 0 ^ k ^ d. Then 


f (x) - CdQ+c^^^+.-. + c^+coQ; 

thus, any polynomial can be represented as a sum of multiples of binomial 
coefficients. Such an expansion is called the Newton series of f(x), because 
Isaac Newton used it extensively. 

We observed earlier in this chapter that the addition formula implies 



Therefore, by induction, the nth difference of a Newton series is very simple: 


AnfW “ Cd ( d -n) +Cd - 1 (d-ll-n) + "' + Cl (l-n) +C0 (- X T 

If we now set x = 0, all terms Ck( k ^ T1 ) on the right side are zero, except the 
term with k — n = 0 ; hence 

f c n , if n ^ d; 

1 0 , if n > d. 


A n f (0) = 
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The Newton series for f(x) is therefore 

f (x) = A d f(0) Q + f (0) ( d ^ ^ + ' ' ' + Af(0) Q + f (0) Q . 

For example, suppose f(x) = x 3 . It’s easy to calculate 

f(0)=0, f(l) = 1, f(2)=8, f (3) = 27; 

Af(0) = 1 , Af(l ) = 7, Af(2) = 19; 

A 2 f(0) = 6 , A 2 f (1 ) = 12; 

A 3 f(0) = 6 . 

So the Newton series is x 3 = 6 ( 3 ) + 6 ( 2 ) + 1 (*) + 0(q). 

Our formula A n f (0) = c n can also be stated in the following way, using 
( 5 . 40 ) with x = 0 : 

X Q) H ) k ( c ° (0) + Cl (j ) + C2 (2) + ' " ) = M rCn ’ 

integer n ^ 0 . 

Here (co, Ci , C 2 , . . . ) is an arbitrary sequence of coefficients; the infinite sum 
c °(o) + c i (k) + 02 ( 1 ) + • • • is actually finite for all k ^ 0 , so convergence is 
not an issue. In particular, we can prove the important identity 

("^V- 1 ) k ( a o + ai h+ - • + ci n k n ) = (— 1 ) n u! a n , 

k ' ' 

integer n ^ 0 , ( 5 . 42 ) 

because the polynomial ao + ai k + • • • + a n k n can always be written as a 
Newton series c 0 (q) + c i (^) + • • • + c n (^) with c n = n! a n . 

Many sums that appear to be hopeless at first glance can actually be 
summed almost trivially by using the idea of nth differences. For example, 
let’s consider the identity 

X (k) ( r n Sk ) ) k = sn - inte S er TV £ 0. (5.43) 

This looks very impressive, because it’s quite different from anything we’ve 
seen so far. But it really is easy to understand, once we notice the telltale 
factor (£)(— 1 ) k in the summand, because the function 

f(k) = n Sk ) = ^j (— 1 ) n s n k n + • • • = (- 1 ) n s 
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(Since 
E x 
and E : 


is a polynomial in k of degree n, with leading coefficient (—1 ) n s n /n!. There- 
fore (5.43) is nothing more than an application of (5.42). 

We have discussed Newton series under the assumption that f(x) is a 
polynomial. But we’ve also seen that infinite Newton series 



make sense too, because such sums are always finite when x is a nonnegative 
integer. Our derivation of the formula A n f(0) = c n works in the infinite case, 
just as in the polynomial case; so we have the general identity 

f(x) = f (0) Q + Af (0) + A 2 f (0) Q + A 3 f (0) (*)+■■■, 

integer x ^ 0. (5.44) 


This formula is valid for any function f(x) that is defined for nonnegative 
integers x. Moreover, if the right-hand side converges for other values of x, 
it defines a function that “interpolates” f(x) in a natural way. (There are 
infinitely many ways to interpolate function values, so we cannot assert that 
(5.44) is true for all x that make the infinite series converge. For example, 
if we let f(x) = sin(7tx), we have f(x) = 0 at all integer points, so the right- 
hand side of (5.44) is identically zero; but the left-hand side is nonzero at all 
noninteger x.) 

A Newton series is finite calculus’s answer to infinite calculus’s Taylor 
series. Just as a Taylor series can be written 


9( Q ' 


9( Q ) x o + 


0 ! 


1 ! 


2 ! 


I a 2 

-X 


9 '"(a) 
3! 


x 3 + 


E = 1 + A, 
= I k * A k ; 
c g(a) = 

g(a + x ) .) 


the Newton series for f(x) 


g(a + x) 


gfa) 

0 ! 


x- + 


= g ( a + x) can be written 
Agfa) , A 2 g(a) 2 A 3 g(a) 


. (545) 


(This is the same as (5.44), because A n f(0) = A n g(a) for all n ^ 0 when 
f(x) = g(a + x).) Both the Taylor and Newton series are finite when g is a 
polynomial, or when x = 0; in addition, the Newton series is finite when x is a 
positive integer. Otherwise the sums may or may not converge for particular 
values of x. If the Newton series converges when x is not a nonnegative integer, 
it might actually converge to a value that’s different from g(a + x), because 
the Newton series (5.45) depends only on the spaced-out function values g(a), 
g(a + 1 ), g(a + 2), ... . 
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One example of a convergent Newton series is provided by the binomial 
theorem. Let g(x) = (1 + z) x , where z is a fixed complex number such that 
|z| < 1. Then Ag(x) = (1 + z) x+1 - (1 + z) x = z(1 + z) x , hence A n g(x) = 
z n (l + z) x . In this case the infinite Newton series 


g(a + x) = ^A n g(a)( X ) = (l+z) u ^( X )z 


converges to the “correct” value (1 +z) a+x , for all x. 

James Stirling tried to use Newton series to generalize the factorial func- 
tion to noninteger values. First he found coefficients S n such that 


x! 



(5-46) 


is an identity for x = 0, x = 1 , x = 2, etc. But he discovered that the resulting 
series doesn’t converge except when x is a nonnegative integer. So he tried 
again, this time writing 


lnx! 



(5-47) 


Now A(lnx!) = ln(x + 1 )! — lnx! = ln(x + 1 ), hence 

s n = A n (lnx!)| x=0 

= A n_1 (ln(x + 1 )) | x=0 

= z( n k ^(-ir-'^lnOc+l) 

k x ' 

by (5.40). The coefficients are therefore so = si =0; S2 = In 2; S3 = ln3 — 
2 In 2 = In | ; S4 = In 4— 3 In 3+3 In 2 = In etc. In this way Stirling obtained 
a series that does converge (although he didn’t prove it); in fact, his series 
converges for all x > — 1. He was thereby able to evaluate 4-! satisfactorily. 
Exercise 88 tells the rest of the story. 

Trick 3: Inversion. 

A special case of the rule (5.45) we’ve just derived for Newton’s series 
can be rewritten in the following way: 

g( n ) = Z( k )(-^ kf W *=* f(n) = ( 5 - 48 ) 


“Forasmuch as these 
terms increase 
very fast, their 
differences will 
make a diverging 
progression, which 
hinders the ordinate 
of the parabola 
from approaching to 
the truth; therefore 
in this and the like 
cases, I interpolate 
the logarithms of 
the terms, whose 
differences consti- 
tute a series swiftly 
converging. " 

— J. Stirling [343] 


(Proofs of conver- 
gence were not 
invented until the 
nineteenth century.) 
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Invert this: 
‘zinb ppo’. 


This dual relationship between f and g is called an inversion formula ; it’s 
rather like the Mobius inversion formulas (4.56) and (4.61) that we encoun- 
tered in Chapter 4 . Inversion formulas tell us how to solve “implicit recur- 
rences,” where an unknown sequence is embedded in a sum. 

For example, g(n) might be a known function, and f(n) might be un- 
known; and we might have found a way to show that g(n) = ^T k (k) ( — 1 ) k f(k). 
Then (5.48) lets us express f (n) as a sum of known values. 

We can prove (5.48) directly by using the basic methods at the beginning 
of this chapter. If g(n) = (k) ( — 1 ) k f(k) for all n 5s 0 , then 



l<)b-iKv) 


The proof in the other direction is, of course, the same, because the relation 
between f and g is symmetric. 

Let’s illustrate (5.48) by applying it to the “football victory problem”: 
A group of n fans of the winning football team throw their hats high into the 
air. The hats come back randomly, one hat to each of the n fans. How many 
ways h.(n,k) are there for exactly k fans to get their own hats back? 

For example, if n = 4 and if the hats and fans are named A, B, C, D, 
the 4 ! = 24 possible ways for hats to land generate the following numbers of 
rightful owners: 


ABCD 

4 

BACD 

2 

CABD 

1 

DABC 

0 

ABDC 

2 

BADC 

0 

CADB 

0 

DACB 

1 

ACBD 

2 

BCAD 

1 

CBAD 

2 

DBAC 

1 

ACDB 

1 

BCDA 

0 

CBDA 

1 

DBCA 

2 

ADBC 

1 

BDAC 

0 

CDAB 

0 

DCAB 

0 

ADCB 

2 

BDCA 

1 

CDBA 

0 

DCBA 

0 


Therefore h( 4 , 4 ) = 1 ; h.( 4 , 3 )= 0 ; h.( 4 , 2 )= 6 ; h( 4 , 1 ) = 8; h.( 4 , 0 )= 9 . 
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We can determine h(u, k) by noticing that it is the number of ways to 
choose k lucky hat owners, namely (™), times the number of ways to arrange 
the remaining n— k hats so that none of them goes to the right owner, namely 
h(n — k, 0). A permutation is called a derangement if it moves every item, 
and the number of derangements of u objects is sometimes denoted by the 
symbol ‘nj’, read “n subfactorial.” Therefore h(n — k, 0) = (n — k)j, and we 
have the general formula 

h(n,k) = ^jh(n-k, 0 ) = ^(n-k)j. 

(Subfactorial notation isn’t standard, and it’s not clearly a great idea; but 
let’s try it awhile to see if we grow to like it. We can always resort to ‘D n ’ or 
something, if ‘nj’ doesn’t work out.) 

Our problem would be solved if we had a closed form for nj, so let’s see 
what we can find. There’s an easy way to get a recurrence, because the sum 
of h(n,k) for all k is the total number of permutations of n hats: 

n! = ^H(n,k) = X(uV n-k )‘ 
k k k ' 

= X. ( £ ) k ‘ > integer n ^ 0 . ( 5 . 49 ) 

k ' ' 


(We’ve 

changed 

k to n — 

k and ( 

Vn 

\) to 

(0) in the last step.) With 

implicit recurrence we can 

compute 

all the 

Mn, k)’s we like: 

n 

Mu,0) 

h(n, 1 ) 

Mu, 2 ) 

Mu, 3) 

h(n,4) Mu, 5) Mn, 6 ) 

0 

1 





1 

0 

1 




2 

1 

0 

1 



3 

2 

3 

0 

1 


4 

9 

8 

6 

0 

1 

5 

44 

45 

20 

10 

0 1 

6 

265 

264 

135 

40 

15 0 1 


For example, here’s how the row for n = 4 can be computed: The two right- 
most entries are obvious — there’s just one way for all hats to land correctly, 
and there’s no way for just three fans to get their own. (Whose hat would the 
fourth fan get?) When k = 2 and k = 1 , we can use our equation for h(n, k), 
giving h(4,2) = ( 2 ) 11 ( 2 , 0) =61 = 6 , and MM 1) = (?)h(3,0) = 4-2 = 8 . We 
can’t use this equation for h.(4, 0); rather, we can, but it gives us h.(4, 0) = 
(^)h(4, 0), which is true but useless. Taking another tack, we can use the 
relation h.(4, 0) + 8 + 6 + 0 + 1 = 4! to deduce that h.(4, 0) =9; this is the value 
of 4j. Similarly nj depends on the values of kj for k < u. 


The art of math- 
ematics, as of life, 
is knowing which 
truths are useless. 
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Baseball fans: .367 
is also Ty Cobb’s 
lifetime batting 
average, the all-time 
record. Can this be 
a coincidence? 

(Hey wait, you’re 
fudging. Cobb’s 
average was 
4191/11429 « 
.366699, while 
1/e « .367879. 
But maybe if 
Wade Boggs has 
a few really good 
seasons. . . ) 


How can we solve a recurrence like ( 5 . 49 )? Easy; it has the form of ( 5 . 48 ), 
with g(n) = n! and f(k) = (— 1) k kj. Hence its solution is 

nj = (-D n ^ (;Wk!. 

k k ' 


Well, this isn’t really a solution; it’s a sum that should be put into closed form 
if possible. But it’s better than a recurrence. The sum can be simplified, since 
k! cancels with a hidden k! in (£), so let’s try that: We get 


*i= z 

0^k$n 


n! 

(n-k)! 


(-1) n+k 


= n! £ 

O^k^n 


(-i) k 

k! 


(5-5o) 


The remaining sum converges rapidly to the number Xk>o^ — 1) k /1<-! = e 
In fact, the terms that are excluded from the sum are 


*Z 

k>n 


(-1) k 

k! 


M ) n+1 y, ,,k (n+ill 
n +1 £- Q (k + n + 1 )! 

M) n+1 L _ _i _ 1 

n+1 \ n + 2 (n + 2)(n + 3) 


and the parenthesized quantity lies between 1 and 1 — = ^±2 . Therefore 

the difference between nj and n!/e is roughly 1 /n in absolute value; more 
precisely, it lies between 1/(n + 1) and 1/(n + 2). But nj is an integer. 
Therefore it must be what we get when we round n!/e to the nearest integer, 
if n > 0. So we have the closed form we seek: 


nj = 


n! 1 

7 + 2 


+ [n = 0 ] . 


(5-5i) 


This is the number of ways that no fan gets the right hat back. When 
n is large, it’s more meaningful to know the probability that this happens. 
If we assume that each of the n! arrangements is equally likely — because the 
hats were thrown extremely high — this probability is 

nj n!/e + 0 ( 1 ) 1 

— = j ~ - = -367. . . . 

n! n! e 

So when n gets large the probability that all hats are misplaced is almost 37%. 

Incidentally, recurrence ( 5 . 49 ) for subfactorials is exactly the same as 
( 5 . 46 ), the first recurrence considered by Stirling when he was trying to gen- 
eralize the factorial function. Hence Sk = kj. These coefficients are so large, 
it’s no wonder the infinite series ( 5 . 46 ) diverges for noninteger x. 

Before leaving this problem, let’s look briefly at two interesting patterns 
that leap out at us in the table of small h(n, k). First, it seems that the num- 
bers 1, 3, 6 , 10, 15, . . . below the all-0 diagonal are the triangular numbers. 
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This observation is easy to prove, since those table entries are the h.(n, n— 2)’s, 
and we have 


h(n,n— 2) 



It also seems that the numbers in the first two columns differ by ±1. Is 
this always true? Yes, 


h(n, 0) — h.(n, 1 ) = nj— n(n— 1)j 


= n! 


O^k^n 

M) 

n! 


k! 

= (-1 


- n n 


-D! L 


0$k:Sn-1 


(- 1 ) 

k! 


In other words, nj = n(n — 1)j + (— 1) n . This is a much simpler recurrence 
for the derangement numbers than we had before. 

Now let’s invert something else. If we apply inversion to the formula 

y (~1) k = 1_/x + n\ _1 

4- W * + l< ” A n ) 


But inversion is the 
source of smog. 


that we derived in (5.41), we find 


x + n 


= L 

k>0 


(- 1 )" 


k /x + k' 


This is interesting, but not really new. If we negate the upper index in ( x k k ), 
we have merely discovered identity (5.33) again. 


5.4 GENERATING FUNCTIONS 

We come now to the most important idea in this whole book, the 
notion of a generating function. An infinite sequence (do, ai , 0.2, . . . ) that 
we wish to deal with in some way can conveniently be represented as a power 
series in an auxiliary variable z, 

A(z) = do + CL1Z+ a 2 z 2 -f = a k z k . (5-5 2 ) 

k^O 

It’s appropriate to use the letter z as the name of the auxiliary variable, be- 
cause we’ll often be thinking of z as a complex number. The theory of complex 
variables conventionally uses ‘z’ in its formulas; power series (a.k.a. analytic 
functions or holomorphic functions) are central to that theory. 



5.4 GENERATING FUNCTIONS 197 


(See [223] for a 
discussion of the 
history and use- 
fulness of this 
notation.) 


We will be seeing lots of generating functions in subsequent chapters. 
Indeed, Chapter 7 is entirely devoted to them. Our present goal is simply to 
introduce the basic concepts, and to demonstrate the relevance of generating 
functions to the study of binomial coefficients. 

A generating function is useful because it’s a single quantity that repre- 
sents an entire infinite sequence. We can often solve problems by first setting 
up one or more generating functions, then by fooling around with those func- 
tions until we know a lot about them, and finally by looking again at the 
coefficients. With a little bit of luck, we’ll know enough about the function 
to understand what we need to know about its coefficients. 

If A(z) is any power series ^Ik>o a k zk ) we find ^ convenient to write 

[z n ]A(z) = a n ; (5.53) 

in other words, [z n ] A(z) denotes the coefficient of z n in A(z). 

Let A(z) be the generating function for (qo, Qi , Q2, . . . ) as in (5.52), 
and let B(z) be the generating function for another sequence (bo, b] , b2, . . . ). 
Then the product A(z)B(z) is the power series 


(cio + eh z + 02 z 2 + ■ • • )(bo + biz + b2Z 2 -| ) 

= a 0 b 0 + (a 0 bi + aib 0 )z + (a 0 b 2 + aib] + a2b 0 )z 2 + • • • ; 

the coefficient of z n in this product is 

n 

a 0 b n T a 1 b n _ 1 -j b a n b 0 = Y a kb n -k • 

k=0 

Therefore if we wish to evaluate any sum that has the general form 

n 

Cti = aicb^-Tc , (5.54) 

k=0 


and if we know the generating functions A(z) and B(z), we have 
c n = [z n ] A(z)B(z) . 

The sequence (c n ) defined by (5.54) is called the convolution of the se- 
quences (a n ) and (b n ); two sequences are “convolved” by forming the sums of 
all products whose subscripts add up to a given amount. The gist of the previ- 
ous paragraph is that convolution of sequences corresponds to multiplication 
of their generating functions. 
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Generating functions give us powerful ways to discover and/or prove 
identities. For example, the binomial theorem tells us that (1 + z) r is the 
generating function for the sequence ((J) > (i) > (2) > • • • )• 


(1 +z) r 



Similarly, 

(1 +z) s 



If we multiply these together, we get another generating function: 


(1 +z) r (l +z) s = (1 +z) r+s . 


And now comes the punch line: Equating coefficients of z n on both sides of 
this equation gives us 



We’ve discovered Vandermonde’s convolution, (5.27)! 

That was nice and easy; let’s try another. This time we use (1 — z) r , which 
is the generating function for the sequence ( (— 1 ) n (^) ) — ( (0) > — ({) , (->) > • • • )• 
Multiplying by (1 + z) r gives another generating function whose coefficients 
we know: 

(1 -z) r (l +z) r = (1 -z 2 ) r . 

Equating coefficients of z n now gives the equation 

| 0 (k)(nlk) | - , ' k = l " , ) n/2 ( n /2 

We should check this on a small case or two. When n = 3, for example, 
the result is 



(527)! = 

( 5 - 2 7 )( 4 - 2 7 ) 

( 3 - 2 7 )( 2 - 2 7 ) 

(1.27X0.27)!. 


Each positive term is cancelled by a corresponding negative term. And the 
same thing happens whenever n is odd, in which case the sum isn’t very 
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interesting. But when u is even, say n = 2 , we get a nontrivial sum that’s 
different from Vandermonde’s convolution: 



So (5.55) checks out fine when n = 2 . It turns out that (5.30) is a special case 
of our new identity (5.55). 

Binomial coefficients also show up in some other generating functions, 
most notably the following important identities in which the lower index 
stays fixed and the upper index varies: 


If you have a high- 
lighter pen, these 
two equations have 
got to be marked. 


1 

(1 — z )^ +1 


(1 — zj n+1 



integer n j> 0 



integer n 3$ 0 . 


( 5 - 56 ) 

( 5 - 57 ) 


The second identity here is just the first one multiplied by z n , that is, “shifted 
right” by n places. The first identity is just a special case of the binomial 
theorem in slight disguise: If we expand (1 — z) _n_1 by (5.13), the coefficient 
of z k is (~ t )T 1 ) (— 1 ) k , which can be rewritten as ( k ^ n ) or ( n ^ k ) by negating 
the upper index. These special cases are worth noting explicitly, because they 
arise so frequently in applications. 

When n = 0 we get a special case of a special case, the geometric series: 


' = 1+z + z 2 +z 3 + ... 

1 — Z 



k $:0 


This is the generating function for the sequence ( 1 , 1 , 1 ,...), and it is espe- 
cially useful because the convolution of any other sequence with this one is 
the sequence of sums: When = 1 for all k, (5.54) reduces to 


n 

C n — ) O |< . 

k =0 


Therefore if A(z) is the generating function for the summands (do, ai , 012, . . . ), 
then A(z )/(1 — z) is the generating function for the sums (co, Ci , C2, . . . ). 

The problem of derangements, which we solved by inversion in connection 
with hats and football fans, can be resolved with generating functions in an 
interesting way. The basic recurrence 



n! 
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can be put into the form of a convolution if we expand (£) in factorials and 
divide both sides by n!: 


1 = z 


k=0 


1 (n — k)j 
k! (n — k)! 


The generating function for the sequence 


VO!’ 1 ! > 2! ’ ■ 


D(z) = 

k >0 


the convolution/recurrence tells us that 

1 = e z D(z) . 


1 — z 

Solving for D(z) gives 


Dfz) = ^—e- z = 1 


1 — z 


is e z ; hence if we let 


ho hi , hi , 
1-zVO! TT + 2! Z + 


Equating coefficients of z n now tells us that 


Ai 

n! 


L 


k=0 


(-i)h 

k! ’ 


this is the formula we derived earlier by inversion. 

So far our explorations with generating functions have given us slick 
proofs of things that we already knew how to derive by more cumbersome 
methods. But we haven’t used generating functions to obtain any new re- 
sults, except for (5.55). Now we’re ready for something new and more sur- 
prising. There are two families of power series that generate an especially rich 
class of binomial coefficient identities: Let us define the generalized binomial 
series $ t (z) and the generalized exponential series £ t (z) as follows: 

k k 

® t (z) = L (tk )— I,; £t(z) = (5-58) 

k^O ' k^O 

It can be shown that these functions satisfy the identities 

‘Bt(z) 1 ~ t — ‘B t (z) _ ' t = z; £t(z) _t ln£ t (z) = z. (5.59) 


In the special case t = 0, we have 


®o(z) = 1 + z; 


£ 0 (z) = e z ; 
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The generalized bi- 
nomial series B t (z) 
was discovered in 
the 1750s by J. H. 
Lambert [236, §38], 
who noticed a few 
years later [237] 
that its powers 
satisfy the first 
identity in (5.60). 


this explains why the series with parameter t are called “generalized” bino- 
mials and exponentials. 

The following pairs of identities are valid for all real r: 


®t(z) r = 


k>0 


tk + r\ r 


A- 


®t(z) 


k J tk + r ’ 

£t(z) r = H 
tk + r 


Ik— 1 


k>0 


(tk + r) 1, k 

' id Z ; 


1 — t + 




k^O 


£t(z) r (tk + r) k k 


1 — zt£ t (z 


T = L 


k>0 


k! 


(5-6o) 


(5-6i) 


(When tk + r = 0, we have to be a little careful about how the coefficient 
of z k is interpreted; each coefficient is a polynomial in r. For example, the 
constant term of £ t (z) T is r(0 + r) _1 , and this is equal to 1 even when r = 0.) 

Since equations (5.60) and (5.61) hold for all r, we get very general iden- 
tities when we multiply together the series that correspond to different powers 
r and s. For example, 


®t(z) 


®t(z) 


1 — 1 + TB t (z) 


L 


tk + r 
k 


tk + r 


L 



z 


j 


sr~ n u Ak + r\ r (t[n — k) + s\ 
k J tk + r \ n-k )' 


This power series must equal 

®t(z) T+s = y- 

RUzl-1 2 — 
n^O 


1 — t + t*B t (z) 


tn + r + s 
n 


hence we can equate coefficients of z n and get the identity 
y- /tk + r\ /t(n - k) + s\ r _ /tn + r + s\ 

k J{ n-k JikT^ n J ’ 


integer n, 


valid for all real r, s, and t. When t = 0 this identity reduces to Vander- 
monde’s convolution. (If by chance tk + r happens to equal zero in this 
formula, the denominator factor tk + r should be considered to cancel with 
the tk + r in the numerator of the binomial coefficient. Both sides of the iden- 
tity are polynomials in r, s, and t.) Similar identities hold when we multiply 
‘B t (z) r by ® t (z) s , etc.; Table 202 presents the results. 
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Table 202 General convolution identities, valid for integer n^> 0. 

y /tk + r\ /tn. — tk + s\ r /tn + r + s\ 

2-{ k ){ n-k JikTT = ^ n J- 

y /tk + r\ /tn — tk + s\ r s 

■y y k / y n — k /tk + r tn — tk + s 

/tn + r + s\ r + s 
y n /tn + r + s ' 

Y" (tk + r) k (tn- tk + s) n k — ^ — = (tn + r + s) n . 
z — V k / tk + r 

k N ' 


y j (tk + r) k (tn — tk+ s) n 


tk + r tn — tk + s 

, r + s 

(tn + r + s) 


tn + r + s 


( 5 - 62 ) 


(5-63) 

( 5 - 64 ) 


( 5 - 65 ) 


We have learned that it’s generally a good idea to look at special cases of 
general results. What happens, for example, if we set t = 1? The generalized 
binomial (z) is very simple — it’s just 

®i(z) = 

z — I — z 


therefore 23 1 (z) doesn’t give us anything we didn’t already know from Van- 
dermonde’s convolution. But £1 (z) is an important function, 

£(z) = X(k+ 1 ) k ~ 1 j“[ = 1 +z+ y 2 +|z 3 + hy 4 H (5.66) 

k^O 


which we haven’t seen before; it satisfies the basic identity 

£(z) = e z£(z) . ( 5 . 67 ) 

This function, first studied by Euler [117] and Eisenstein [91], arises in a great 
many applications [203, 193]. 

The special cases t = 2 and t = — 1 of the generalized binomial are of 
particular interest, because their coefficients occur again and again in prob- 
lems that have a recursive structure. Therefore it’s useful to display these 


Aha! This is the 
iterated power 
function 
£(ln z) = z 2 
that I’ve often 
wondered about. 

Zzzzzz. . . 
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The power series 
for' B, /2 (z) r = 

[V^+4 + z) 2t /4 t 

is noteworthy too. 


series explicitly for future reference: 


®i(z) 


(z) 


® 2 (z) r 


*B-i (z) r 


23 2 (z) r 
Vi — 4z 


Z 

k 

Z 

k 

Z 

k 

Z 

k 

Z 

k 

z 

k 

z 


2 k\ z k 

k J 1 +k 

2 k+ 1 \ z 


k 1 1 + 2 k 


1 -k\ z 


1 - Vi — 4z 
2 z 


k y i — k 

2k — 1\ (— z) k _ 1 +V1 +4z 

k J 1 - 2 k " 2 


2 k + r\ r 


k / 2 k + r 


r — k \ r 


z . 


k / r — k 


2 k + r 


23-1 (zr +1 _ y /T-k\ k 

vT+4z 4" V k / 


(5-68) 


(5-6g) 

(5-7o) 

(5-7i) 

(5-72) 

(5-73) 


The coefficients ( 2 y)fppr of 23 2 (z) are called the Catalan numbers C n , be- 
cause Eugene Catalan wrote an influential paper about them in the 1830s 
[52]. The sequence begins as follows: 


rt 

0 1 2 

3 

4 

5 

6 

7 

8 

9 

10 

Cn 

1 1 2 

5 

14 

42 

132 

429 

1430 

4862 

16796 


The coefficients of *B_i (z) are essentially the same, but there’s an extra 1 at 
the beginning and the other numbers alternate in sign: (1 , 1 , — 1 ,2, —5, 14, . . . ). 
Thus “B-i (z) = 1 + zT>2{— z). We also have 23_i (z) = ‘B 2 (— z) _1 . 

Let’s close this section by deriving an important consequence of ( 5 . 72 ) 
and ( 5 . 73 ), a relation that shows further connections between the functions 
‘B_i (z) and ‘B 2 (— z): 


®_i (z ) n+1 - (-z) n+ 1 ® 2 (-z ) n+1 
Vi +4z 
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This holds because the coefficient of z k in (— z) n+1 ‘B2(— z) n+1 /-\/1 + 4z is 


kl (-z) n+1 ® 2 (-z) n+1 


vT+4z 


n+1 


(- 1 ) n + 1 [ z k-n- 1 ] 

vT+4z 

n+1 f i ) k— n— 1 j-^k— n— 1] 




® 2 (z: 


iTL + 1 


Vl -4z 


= (-1 


t ic ^2(k — n — 1 ) + n + 1 
k-n- 1 


= (-l)+ k - n + = 

k— n— 1 / V k 


n-k 


, k1 s_ 1 (zr +i 

V 7 1 +4z 


when k > n. The terms nicely cancel each other out. We can now use (5.68) 
and (5.69) to obtain the closed form 


- M(^r- 

integer n ^ 0. (5-74) 

(The special case z = — 1 came up in Problem 3 of Section 5.2. Since the 
numbers j(l ± \f—i) are sixth roots of unity, the sums X^k<n ( n k k )( — l) k 
have the periodic behavior we observed in that problem.) Similarly we can 
combine (5.70) with (5.71) to cancel the large coefficients and get 

v- /n — k\ n k / 1 + v++4z\ n /1 -+l+ 4 z\ n 

k + k = + (— 2— )■ 

integer n > 0. (5.75) 


5.5 HYPERGEOMETRIC FUNCTIONS 


The methods we’ve been applying to binomial coefficients are very 
effective, when they work, but we must admit that they often appear to be 
ad hoc — more like tricks than techniques. When we’re working on a problem, 
we often have many directions to pursue, and we might find ourselves going 
around in circles. Binomial coefficients are like chameleons, changing their 
appearance easily. Therefore it’s natural to ask if there isn’t some unifying 
principle that will systematically handle a great variety of binomial coefficient 
summations all at once. Fortunately, the answer is yes. The unifying principle 
is based on the theory of certain infinite sums called hypergeometric series. 


They’re even more 
versatile than 
chameleons; we 
can dissect them 
and put them 
back together in 
different ways. 
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Anything that has 
survived for cen- 
turies with such 
awesome notation 
must be really 
useful. 


The study of hypergeometric series was launched many years ago by Eu- 
ler, Gauss, and Riemann; such series, in fact, are still the subject of consid- 
erable research. But hypergeometrics have a somewhat formidable notation, 
which takes a little time to get used to. 

The general hypergeometric series is a power series in z with m + n 
parameters, and it is defined as follows in terms of rising factorial powers: 



(5-76) 


To avoid division by zero, none of the b’s may be zero or a negative integer. 
Other than that, the a’s and b’s may be anything we like. The notation 
‘F( ai , . . . , a m ; bi , . . . , b n ; z)’ is also used as an alternative to the two-line form 
(5.76), since a one-line form sometimes works better typographically. The a’s 
are said to be upper parameters ; they occur in the numerator of the terms 
of F. The b’s are lower parameters, and they occur in the denominator. The 
final quantity z is called the argument. 

Standard reference books often use ‘ m F n ’ instead of *F’ as the name of a 
hypergeometric with m upper parameters and u. lower parameters. But the 
extra subscripts tend to clutter up the formulas and waste our time, if we’re 
compelled to write them over and over. We can count how many parameters 
there are, so we usually don’t need extra additional unnecessary redundancy. 

Many important functions occur as special cases of the general hypergeo- 
metric; indeed, that’s why hypergeometrics are so powerful. For example, the 
simplest case occurs when m = n = 0: There are no parameters at all, and 
we get the familiar series 



Actually the notation looks a bit unsettling when m or n is zero. We can add 
an extra ‘1’ above and below in order to avoid this: 



In general we don’t change the function if we cancel a parameter that occurs 
in both numerator and denominator, or if we insert two identical parameters. 

The next simplest case has m = 1, ai = 1, and n = 0; we change the 
parameters to m = 2, ai = 0.2 = 1 , n = 1 , and bi = 1 , so that n > 0. This 
series also turns out to be familiar, because l k = k!: 
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It’s our old friend, the geometric series; F(qi , . . . , a m ; bi , . . . , b n ; z) is called 
hypergeometric because it includes the geometric series F(1 , 1 ; 1 ; z) as a very 
special case. 

The general case m = 1 and n = 0 is, in fact, easy to sum in closed form, 


a, 1 

1 




k>0 


k- 1 
k 


z k = 


(1 -z) 


(5-77) 


using (5.56). If we replace a by —a and z by — z, we get the binomial theorem, 


F 




= (1 +z) Q - 


A negative integer as upper parameter causes the infinite series to become 
finite, since (— a) k = 0 whenever k > a ^ 0 and a is an integer. 

The general case m = 0, n = 1 is another famous series, but it’s not as 
well known in the literature of discrete mathematics: 



(b — 1 )! z k 

(b- 1 +k)! kT 


lb 1 (2\/z) 


(b-1)! 

z (b-1)/2 ' 


(5-78) 


This function Ib-i is called a “modified Bessel function” of order b — 1. The 
special case b = 1 gives us F( 1 1 1 |z) = lo(2^z), which is the interesting series 
Lk^o zk A! 2 . 

The special case m = u. = 1 is called a “confluent hypergeometric series” 
and often denoted by the letter M: 



M(a,b,z) . 


(5-79) 


This function, which has important applications to engineering, was intro- 
duced by Ernst Kummer. 

By now a few of us are wondering why we haven’t discussed convergence 
of the infinite series (5.76). The answer is that we can ignore convergence if 
we are using z simply as a formal symbol. It is not difficult to verify that 
formal infinite sums of the form ][L k>n a k z k form a field, if the coefficients 
a k lie in a field. We can add, subtract, multiply, divide, differentiate, and do 
functional composition on such formal sums without worrying about conver- 
gence; any identities we derive will still be formally true. For example, the 
hypergeometric F( 1 ’]’ 1 |z) = Hk>o^ zk doesn’t converge for any nonzero z; 
yet we’ll see in Chapter 7 that we can still use it to solve problems. On the 
other hand, whenever we replace z by a particular numerical value, we do 
have to be sure that the infinite sum is well defined. 
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The next step up in complication is actually the most famous hypergeo- 
metric of all. In fact, it was the hypergeometric series until about 1870, when 
everything was generalized to arbitrary m and n. This one has two upper 
parameters and one lower parameter: 



q k b k z k 
c k k! 


(5-8o) 


“There must be 
many universities 
to-day where 95 
per cent, if not 
100 per cent, of the 
functions studied by 
physics, engineering, 
and even mathe- 
matics students, 
are covered by 
this single symbol 
F(a, b; c; x).” 

— W. W. Sawyer [318] 


It is often called the Gaussian hypergeometric, because many of its subtle 
properties were first proved by Gauss in his doctoral dissertation of 1812 [143], 
although Euler [118] and Pfaff [292] had already discovered some remarkable 
things about it. One of its important special cases is 

ln(l + z) = zF 


Notice that z _1 ln(l +z) is a hypergeometric function, but ln(l +z) itself can- 
not be hypergeometric, since a hypergeometric series always has the value 1 
when z = 0. 


- = * L 


k! k! (- z )* 


k^O 


(k + 1 )! k! 


z 2 z 3 z 4 
- + y~y + 


So far hypergeometrics haven’t actually done anything for us except pro- 
vide an excuse for name-dropping. But we’ve seen that several very different 
functions can all be regarded as hypergeometric; this will be the main point of 
interest in what follows. We’ll see that a large class of sums can be written as 
hypergeometric series in a “canonical” way, hence we will have a good filing 
system for facts about binomial coefficients. 

What series are hypergeometric? It’s easy to answer this question if we 
look at the ratio between consecutive terms: 


F 



tic = -=■ 


n k z k 
. u. m z. 


b k . . . b k k! 


The first term is to = 1 , and the other terms have ratios given by 


tk+i _ a k+1 . . . a£+‘ 1 b k . . . b k k! z k+1 

a k . . . a]^ b k+ 1 ...bn +1 (k+1)! z k 

(k+a 1 ).,.(k+a m )z 
(k + bi ) . . . (k + b n )(k + 1 ) 


This is a rational function of k, that is, a quotient of polynomials in k. 
According to the Fundamental Theorem of Algebra, any rational function 
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of k can be factored over the complex numbers and put into this form. The 
a’s are the negatives of the roots of the polynomial in the numerator, and the 
b’s are the negatives of the roots of the polynomial in the denominator. If the 
denominator doesn’t already contain the special factor (k+ 1 ), we can include 
(k + 1 ) in both numerator and denominator. A constant factor remains, and 
we can call it z. Therefore hypergeometric series are precisely those series 
whose first term is 1 and whose term ratio tk+i /tj- is a rational function of k. 
Suppose, for example, that we’re given an infinite series with term ratio 

tk+ 1 k 2 + 7k +10 
= 4k 2 + 1 ’ 

a rational function of k. The numerator polynomial splits nicely into two 
factors, (k + 2)(k + 5), and the denominator is 4(k + 1/2) (k — i/2). Since the 
denominator is missing the required factor (k+ 1 ), we write the term ratio as 

tk+1 _ (k + 2)(k + 5)(k+l)(l/4) 
t k (k + i/2) (k — i/2) (k + 1 ) ’ 

and we can read off the results: The given series is 


y~ tk 

k$:0 


to F 


( 2,5,1 
Vi/2, -i/2 



Thus, we have a general method for finding the hypergeometric represen- 
tation of a given quantity S, when such a representation is possible: First we 
write S as an infinite series whose first term is nonzero. We choose a notation 
so that the series is X!k>o ^k with to / 0. Then we calculate tk+i/tk- If the 
term ratio is not a rational function of k, we’re out of luck. Otherwise we 
express it in the form (5.81); this gives parameters ai , . . . , a m , t>i , . . . , b n , 
and an argument z, such that S = to F(ai , . . . , a m ; bi , . . . , b n ; z). 

Gauss’s hypergeometric series can be written in the recursively factored 

form 


(Now is a good 
time to do warmup 
exercise 11.) 


F 



, ab 
1 + j-z 

1 c 



Q+l b+1 

^TcTT 


z 


a + 2 b + 2 
—3 c + 2 


z(l +■■■))) 


if we wish to emphasize the importance of term ratios. 

Let’s try now to reformulate the binomial coefficient identities derived 
earlier in this chapter, expressing them as hypergeometrics. For example, 
let’s figure out what the parallel summation law, 



n 


integer n, 
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First derangements, 
now degenerates. 


looks like in hypergeometric notation. We need to write the sum as an infinite 
series that starts at k = 0, so we replace k by n — k: 


y /r + n-k\ _ y- (r + n-k)! 
l n — k J r!(n — k)! 

k$:0 x 7 k^O v 1 


tic . 

k^O 


This series is formally infinite but actually finite, because the (n — k) ! in the 
denominator will make t^ = 0 when k > n. (We’ll see later that 1/x! is 
defined for all x, and that 1 /x! = 0 when x is a negative integer. But for now, 
let’s blithely disregard such technicalities until we gain more hypergeometric 
experience.) The term ratio is 

tk+i (r + n — k— l)!r! (n — k)! n — k 
t k r! (n — k — 1 )! (r + n — k)! r + n — k 

(k+l)(k-n)(1) 

(k — n — r)(k + 1 ) ' 


Furthermore to = ( r ^ n ). Hence the parallel summation law is equivalent to 
the hypergeometric identity 



C + rO- 


Dividing through by ( r + n ) gives a slightly simpler version, 



r + n+ 1 
r+1 


if 


r + n 
n 


7^0. 


( 5 - 82 ) 


Let’s do another one. The term ratio of identity (5.16), 



integer m, 


is (k — m)/(r — m + k + 1 ) = (k + 1 )(k — m)(l )/(k — m + r + 1 )(k + 1 ), after 
we replace k by m — k; hence (5.16) gives a closed form for 


( 1, — ra 

V-m+r+1 



This is essentially the same as the hypergeometric function on the left of 
(5.82), but with m in place of n and r + 1 in place of — r. Therefore identity 
(5.16) could have been derived from (5.82), the hypergeometric version of 
(5.9). (No wonder we found it easy to prove (5.16) by using (5.9).) 

Before we go further, we should think about degenerate cases, because 
hypergeometrics are not defined when a lower parameter is zero or a negative 
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integer. We usually apply the parallel summation identity when r and n are 
positive integers; but then — n— r is a negative integer and the hypergeometric 
(5.76) is undefined. How then can we consider (5.82) to be legitimate? The 
answer is that we can take the limit of F( 1 ’ 1 1) as e — > 0. 

We will look at such things more closely later in this chapter, but for now 
let’s just be aware that some denominators can be dynamite. It is interesting, 
however, that the very first sum we’ve tried to express hypergeometrically 
has turned out to be degenerate. 

Another possibly sore point in our derivation of (5.82) is that we ex- 
panded ( r ^^ k ) as (r + n — k)!/r! (n — k)!. This expansion fails when r is a 
negative integer, because (— m)! has to be 00 if the law 

0! = 0 • (—1 ) • (—2) • . . . • (— m + 1 ) • (— m)! 

is going to hold. Again, we need to approach integer results by considering a 
limit of r + e as e — > 0. 

But we defined the factorial representation (£) = r! /k! (r — k) ! only when 
r is an integer! If we want to work effectively with hypergeometrics, we need 
a factorial function that is defined for all complex numbers. Fortunately there 
is such a function, and it can be defined in many ways. Here’s one of the most 
useful definitions of z!, actually a definition of 1/z!: 


(We proved the 
identities originally 
for integer r, and 
used the polynomial 
argument to show 
that they hold in 
general. Now we’re 
proving them first 
for irrational r, 
and using a limiting 
argument to show 
that they hold for 
integers!) 


1 

z! 


lim 

n — >oo 



(5-83) 


(See exercise 21. Euler [99, 100, 72] discovered this when he was 22 years 
old.) The limit can be shown to exist for all complex z, and it is zero only 
when z is a negative integer. Another significant definition is 


z! 



dt , 


if 9tz > -1 . 


(5-84) 


This integral exists only when the real part of z exceeds —1, but we can use 
the formula 


z! = z (z 1 ) ! (5.85) 

to extend the definition to all complex z (except negative integers). Still 
another definition comes from Stirling’s interpolation of lnz! in (5.47). All of 
these approaches lead to the same generalized factorial function. 

There’s a very similar function called the Gamma function, which re- 
lates to ordinary factorials somewhat as rising powers relate to falling powers. 
Standard reference books often use factorials and Gamma functions simulta- 
neously, and it’s convenient to convert between them if necessary using the 
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following formulas: 


F(z+ 1) 
(-z)i r(z) 


= z!; 

7T 

sin7tz 


( 5 - 86 ) 

(5-87) 


How do you write 
z to the w power, 
when w is the 
complex conjugate 
of w? 

z m . 


We can use these generalized factorials to define generalized factorial 
powers, when z and w are arbitrary complex numbers: 


z! 

(z — w)! ’ 

F(z + w) 
r(z) 


(5-88) 

(5-89) 


The only proviso is that we must use appropriate limiting values when these 
formulas give 00/00. (The formulas never give 0/0, because factorials and 
Gamma-function values are never zero.) A binomial coefficient can be written 



lim lim 

£ — >z cv — ► 'W 


C! 

cu! (C — cu) ! 


(5-9o) 


1 see, the lower 
index arrives at 
its limit first. 
That’s why z 

J W 

is zero when w is 
a negative integer. 


when z and w are any complex numbers whatever. 

Armed with generalized factorial tools, we can return to our goal of re- 
ducing the identities derived earlier to their hypergeometric essences. The 
binomial theorem (5.13) turns out to be neither more nor less than (5.77), 
as we might expect. So the next most interesting identity to try is Vander- 
monde’s convolution (5.27): 



integer n. 


The kth term here is 


tk 


r! s! 

(r — k)! k! (s — n + k)! (n — k)! ’ 


and we are no longer too shy to use generalized factorials in these expres- 
sions. Whenever tk contains a factor like (a + k)!, with a plus sign before 
the k, we get (a + k + 1 )!/(a + k)! = k + oc + 1 in the term ratio tk+i /tk, 
by (5.85); this contributes the parameter ‘oc + 1 ’ to the corresponding hyper- 
geometric — as an upper parameter if (oc + k) ! was in the numerator of tk, 
but as a lower parameter otherwise. Similarly, a factor like ( a — k) ! leads to 
(a — k— 1)!/(a — k)! = ( — 1 )/(k — oc); this contributes ‘—a’ to the opposite 
set of parameters (reversing the roles of upper and lower), and negates the 
hypergeometric argument. Factors like r!, which are independent of k, go 
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into to but disappear from the term ratio. Using such tricks we can predict 
without further calculation that the term ratio of (5.27) is 

tic+i _ k-t k-n 
tk k + 1 k + s — n + 1 

times (— l) 2 == 1 , and Vandermonde’s convolution becomes 



-r, -n 

s— n+1 



( 5 - 9 i) 


We can use this equation to determine F(a, b; c; z) in general, when z = 1 and 
when b is a negative integer. 

Let’s rewrite (5.91) in a form so that table lookup is easy when a new 
sum needs to be evaluated. The result turns out to be 


/a, b \ T(c- a-b) F(c) integer b ^ 0 

\ c ) r(c-a)r(c-b) ’ or Tic > Tla + Tib. 


( 5 - 92 ) 


Vandermonde’s convolution (5.27) covers only the case that one of the upper 
parameters, say b, is a nonpositive integer; but Gauss proved that (5.92) is 
valid also when a, b, c are complex numbers whose real parts satisfy Tic > 
Tla + Tlb. In other cases, the infinite series F( a ( ; b | l) doesn’t converge. When 
b = — n, the identity can be written more conveniently with factorial powers 
instead of Gamma functions: 


a, — n 


1 = 


c- a 


-c bi- 


integer n 0. 


( 5 - 93 ) 


A few weeks ago, we 
were studying what 
Gauss had done in 
kindergarten. 

Now we’re studying 
stuff beyond his 
Ph.D. thesis. 

Is this intimidating 
or what ? 


It turns out that all five of the identities in Table 169 are special cases of 
Vandermonde’s convolution; formula (5.93) covers them all, when proper at- 
tention is paid to degenerate situations. 

Notice that (5.82) is just the special case a = 1 of (5.93). Therefore we 
don’t really need to remember (5.82); and we don’t really need the identity 
(5.9) that led us to (5.82), even though Table 174 said that it was memo- 
rable. A computer program for formula manipulation, faced with the prob- 
lem of evaluating X!k<n ( T k k )i cou ki convert the sum to a hypergeometric 
and plug into the general identity for Vandermonde’s convolution. 

Problem 1 in Section 5.2 asked for the value of 



This problem is a natural for hypergeometrics, and after a bit of practice any 
hypergeometer can read off the parameters immediately as F( 1 , — m; — n; 1 ). 
Hmmm; that problem was yet another special takeoff on Vandermonde! 
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Kummer was a 
summer. 


The summer of ’36. 


The sum in Problem 2 and Problem 4 likewise yields F(2, 1 — ri;2 — m; 1). 
(We need to replace k by k+ 1 first.) And the “menacing” sum in Problem 6 
turns out to be just F(n + 1 , — n; 2; 1 ). Is there nothing more to sum, besides 
disguised versions of Vandermonde’s powerful convolution? 

Well, yes, Problem 3 is a bit different. It deals with a special case of the 
general sum ( n k k )z k considered in ( 5 . 74 ), and this leads to a closed-form 
expression for 


1 + 2 fn/ 2 ], -n 

1/2 


— z/4 


We also proved something new in ( 5 . 55 ), when we looked at the coeffi- 
cients of (1 — z ) r (1 + z) T : 


F 


1 — c— 2 n, —2 n 
c 



(-1) n 


( 2 n)l 

n! 


(c-D! 

(c + n— 1 )! ’ 


integer n 0 . 


This is called Kummer ’s formula when it’s generalized to complex numbers: 


F 


a, b 

1 +b-a 



(b/2)! 

b! 


(b-a 


,b/2 


(5-94) 


(Ernst Kummer [229] proved this in 1836.) 

It’s interesting to compare these two formulas. Replacing c by 1 — 2n — a, 
we find that the results are consistent if and only if 


(-V 


(2n)! 


n! 



(b/2)l 

b! 


lim 

x — > — n 


x! 

(2xjl 


(5-95) 


when n is a positive integer. Suppose, for example, that n = 3; then we 
should have — 6!/3! = lim x _,_3 x!/(2x)!. We know that (—3)! and (— 6 )! are 
both infinite; but we might choose to ignore that difficulty and to imagine 
that (—3)! = ( — 3) ( — 4 ) ( — 5) ( — 6 ) !, so that the two occurrences of (— 6 )! will 
cancel. Such temptations must, however, be resisted, because they lead to 
the wrong answer! The limit of x!/( 2 x)! as x -) —3 is not ( — 3) ( — 4 ) ( — 5) but 
rather — 6!/3! = ( — 4) ( — 5) ( — 6 ), according to ( 5 . 95 ). 

The right way to evaluate the limit in (5.95) is to use equation (5.87), 
which relates negative-argument factorials to positive-argument Gamma func- 
tions. If we replace x by — n — e and let e — > 0, two applications of (5.87) 
give 


(— ■ n—e)! F(n-Fe) sin(2n + 2 e) 7 t 

(— 2n~2e)l F(2n + 2e) sin(n + e) 7 t 
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Now sin(x + y) = sin x cosy +cosxsiny; so this ratio of sines is 


cos2n.7T sin2e7t 
COS T17t Sin £7T 

by the methods of Chapter 9. Therefore, by (5.86), we have 



lim 

e— ► () 


(-n- e)! 
(— 2n — 2e) ! 


2(-i: 


r(2n) 

r(n) 


2( — 1 ) n 


(2n — 1 ) ! 

(n-1)! 


(-V 


(2n)! 


n! 


as desired. 

Let’s complete our survey by restating the other identities we’ve seen so 
far in this chapter, clothing them in hypergeometric garb. The triple-binomial 
sum in (5.29) can be written 


/I — a— 2n, 1 — b — 2n, — 2n \ 

V a > b ) 

= (2n)!(a + b+2n-ir 

n! a n b n 


integer n ^ 0. 


When this one is generalized to complex numbers, it is called Dixon’s for- 
mula: 


F 


a, b, c 

1 +c — a, 1 +c — b 


(c/2)! (c-a)^(c-b)^ 
c! (c-a-b)^ 2 - 


(5-96) 


<Ha + iHb < 1 + <Rc/2. 


One of the most general formulas we’ve encountered is the triple-binomial 
sum (5.28), which yields Saalschiitz's identity: 


f a, b, — n 

\c, a + b — c— n+1 


(c — a) n (c — b) n 
c n (c — a — b) n 
(a — c)— (b — c)- 
(— c)n (a + b — c)n ’ 


(5-97) 


integer n ^ 0. 


This formula gives the value at z = 1 of the general hypergeometric series 
with three upper parameters and two lower parameters, provided that one of 
the upper parameters is a nonpositive integer and that bi + bz = <+ + az + 
<13 + 1. (If the sum of the lower parameters exceeds the sum of the upper 
parameters by 2 instead of by 1 , the formula of exercise 25 can be used to 
express F(ai , <12, <13; bi , \>2\ 1 ) in terms of two hypergeometrics that satisfy 
Saalschiitz’s identity.) 

Our hard- won identity in Problem 8 of Section 5.2 reduces to 


(Historical note: 
Saaischiitz [315] 
independently dis- 
covered this formula 
almost 100 years 
after P faff [292] had 
first published it. 
Taking the limit as 
n — > 00 yields 
equation (g.Q2).) 


1 / X+1, n+1, -n 

1 + x V 1 , x + 2 


MTx^x- 


n— 1 
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Sigh. This is just the special case c = 1 of Saalschiitz’s identity (5.97), so we 
could have saved a lot of work by going to hypergeometrics directly! 

What about Problem 7? That extra-menacing sum gives us the formula 


/ n+1, m-n, 1, \ A _ m 
VjTa+l, jta+j, 2 J n’ 


which is the first case we’ve seen with three lower parameters. So it looks 
new. But it really isn’t; the left-hand side can be replaced by 

f C n , m— u— 1, - 
V jm, \m-\ 



(Historical note: 
The great relevance 
of hypergeometric 
series to binomial 
coefficient identities 
was first pointed 
out by George 
Andrews in 1974 
[9, section 5].) 


using exercise 26, and Saalschiitz’s identity wins again. 

Well, that’s another deflating experience, but it’s also another reason to 
appreciate the power of hypergeometric methods. 

The convolution identities in Table 202 do not have hypergeometric 
equivalents, because their term ratios are rational functions of k only when 
t is an integer. Equations (5.64) and (5.65) aren’t hypergeometric even when 
t = 1 . But we can take note of what (5.62) tells us when t has small integer 
values: 


2 r > jv+b -n, -n-s 


\t+ 1 , -n-is, — n- 


l S + i 
2 5T 2 


1 = 


r + s + 2 u 
rt 


, 2 r +u l r +b ~ n > - n ~2 s > -n~b 


3 T| 3 1 


\r+b ir+ 1 , -n-is, -n-is+i, -n-is+l 


3> "• 3 3 

T + S + 3lT 
n 


s + 2 n 
n 


s + 3n 
n 


The first of these formulas gives the result of Problem 7 again, when the 
quantities (r, s, n) are replaced respectively by (1 , 2n + 1 — m, — 1 — n). 

Finally, the “unexpected” sum (5.20) gives us an unexpected hypergeo- 
metric identity that turns out to be quite instructive. Let’s look at it in slow 
motion. First we convert to an infinite sum, 



The term ratio from (2m — k)! 2 k /rrU (m — k)! is 2(k — m)/(k — 2m), so we 
have a hypergeometric identity with z = 2: 


2m 

m 


1, — m 

—2m 


2=2 


2 m 


integer m J? 0. 


( 5 - 98 ) 
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But look at the lower parameter ‘—2m’. Negative integers are verboten, so 
this identity is undefined! 

It’s high time to look at such limiting cases carefully, as promised earlier, 
because degenerate hypergeometrics can often be evaluated by approaching 
them from nearby nondegenerate points. We must be careful when we do this, 
because different results can be obtained if we take limits in different ways. 
For example, here are two limits that turn out to be quite different when one 
of the upper parameters is increased by e: 


lim F 

e— ►() 


/-1+e, -3 
V — 2+e 


lim F 

e— ► () 



lim (l + 

e— >0 ' 


( — 1+e) ( — 3) 
(-2+e) 1! 


(-1+e)(e)(-3)(— 2) 
(—2 + e) ( — 1 +e) 2! 


_i_ ( — 1 + e) ( e) ( 1 + e) (— 3) (— 2) ( — 1 ) 
_r (— 2+e) ( — 1 + e) (e) 3! 


1 — 7+O+2 — 0; 


0 + (-2+eH! +0 + 0) 
1 — 7+0 + 0 = —7 . 


Similarly, we have defined (_]) = 0 = lim e ^o ( l| e ); this is not the same 
as lim e ^o (li+e) = 1- The proper way to treat (5.98) as a limit is to re- 
alize that the upper parameter — m is being used to make all terms of the 
series ^ k>0 zero f° r k > ta; this means that we want to make the 

following more precise statement: 



lim F 

e— »0 


f 1 , -m 
2 m+e 


22m 


integer m (+ 0. 


(5-99) 


Each term of this limit is well defined, because the denominator factor (-2m) k 
does not become zero until k > 2m. Therefore this limit gives us exactly the 
sum (5.20) we began with. 


5.6 HYPERGEOMETRIC TRANSFORMATIONS 

It should be clear by now that a database of known hypergeometric 
closed forms is a useful tool for doing sums of binomial coefficients. We 
simply convert any given sum into its canonical hypergeometric form, then 
look it up in the table. If it’s there, fine, we’ve got the answer. If not, we 
can add it to the database if the sum turns out to be expressible in closed 
form. We might also include entries in the table that say, “This sum does 
not have a simple closed form in general.” For example, the sum Xlk<m ( k ) 
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The hypergeo- 
metric database 
should really be a 
“knowledge base.” 


corresponds to the hypergeometric 



F 


/ 1, -m 
\n— m+1 



integers n m ^ 0; 


(5-ioo) 


this has a simple closed form only if m is near 0, jn, or n. 

But there’s more to the story, since hypergeometric functions also obey 
identities of their own. This means that every closed form for hypergeometrics 
leads to additional closed forms and to additional entries in the database. For 
example, the identities in exercises 25 and 26 tell us how to transform one 
hypergeometric into two others with similar but different parameters. These 
can in turn be transformed again. 

In 1797, J. F. Pfaff [292] discovered a surprising reflection law, 


1 

(1 - z) Q 


F 




= F 


a, c — b 
c 



(5-ioi) 


which is a transformation of another type. This is a formal identity in power 
series, if the quantity (— z) k /(1 — z) k+a is replaced by the infinite series 
(— z) k (l + ( k | a )z + ( k+ “ +1 )z 2 + • • • ) when the left-hand side is expanded 
(see exercise 50). We can use this law to derive new formulas from the iden- 
tities we already know, when z^l. 

For example, Kummer’s formula (5.94) can be combined with the reflec- 
tion law (5.101) if we choose the parameters so that both identities apply: 


2 -a F 


/a, 1-a 
\ 1+b— a 


F 


/ a, b 
\1+b— a 



(b/2)! . b/2 

— 1 


( 5 . 102 ) 


We can now set a = — n and go back from this equation to a new identity in 
binomial coefficients that we might need some day: 


y- (— n) k (1+n) k 2~ k 
^ (1+b+n) k k! 


L 


-A* /n + A 


V k )! V 


f n+b+k\ 
k 


J 


(b/2) ! (b+n)! 
b! (b/2+n)! 


integer n (> 0. (5.103) 


For example, when n = 3 this identity says that 


1 - 3 h 3 

2(4 + b) 4(4 + b)(5 + b) 8(4 + b)(5 + b)(6 + b) 

(b + 3)(b+2)(b+ 1) 

(b + 6)(b+4)(b + 2) ' 
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It’s almost unbelievable, but true, for all b. (Except when a factor in the 
denominator vanishes.) 

This is fun; let’s try again. Maybe we’ll find a formula that will really 
astonish our friends. What does Pfaff’s reflection law tell us if we apply it to 
the strange form (5.99), where z = 2? In this case we set a = — m, b = 1, 
and c = —2m + e, obtaining 




(a — c)— / a, — n \ integer 

(— c)a \1 — ri+a — c Z ) ’ n ^ 0. 


(5-105) 



5.6 HYPERGEOMETRIC TRANSFORMATIONS 219 


How do you pro- 
nounce -9 ? 

(Dunno, but TpjX 
calls it ‘vartheta ’.) 


Notice that when z = 1 this reduces to Vandermonde’s convolution, (5.93). 

Differentiation seems to be useful, if this example is any indication; we 
also found it helpful in Chapter 2, when summing x + 2x 2 + • • ■ + nx 11 . Let’s 
see what happens when a general hypergeometric series is differentiated with 
respect to z: 


dz 


ai , . . . , a v 
b i , . . . , b r 


= y S 

uk 


1 • • • “m 


a* z k -' 


k ."b k ...bk(k-1)! 


k+1 n k+1 _k 

Ui ... u.™ L 


y U 1 ■ • • u m 
’- k+1 . bk +1 k! 


k+1 b , 


ai (ai+1 


■ a m (a m +l) k z 


k _k 


” J~o b 1 (b 1 +l) k ...b n (b n +1) k k! 


ai 


bi ...b T 


di +1 , . . . , a m + 1 
b 1 + 1 , . . . , b n +1 


(5.106) 


The parameters move out and shift up. 

It’s also possible to use differentiation to tweak just one of the parameters 
while holding the rest of them fixed. For this we use the operator 


•9 = z 


dz ’ 


which acts on a function by differentiating it and then multiplying by z. This 
operator gives 


9F 


ai , . . . , a m 
bi,...,b n 


uk 


a?... a*, z*- 1 


^ b k ...b k (k- 


_ y~ K u i • ■ 

D! kb b k ... 


ka k ... a^ z k 
b k k! 


which by itself isn’t too useful. But if we multiply F by one of its upper 
parameters, say ai , and add it to 9F, we get 


(9 + ai ) F 


ai , . . . , a m 
bi , . .. ,b n 


y (k+ai)a k . : .ajkz k 

k b k . . . b k k! 

ai (a,+l) k af 


= L 

k^O 

= aiF 


■ a ki z k 


b k . . . b k k! 
ai+1, a 2 , ..., an 


bi 


, , b n 


Only one parameter has been shifted. 
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A similar trick works with lower parameters, but in this case things shift 
down instead of up: 


(•& + bi — 1 ) F 


&!,•••) Qm 
bi,...,b n 


= z 


(k+bl 


1 ) 4 . 


Q k Z k 

u m ^ 


k >0 


4 . . . b k k! 


y (b! -1)a4..<z k 

k^o (b-i -1) k b4-.b^k! 


= (b, — 1 ) F 


/ ai , . . . , a m 
\ bi — 1 , b 2 , b n 



We can now combine all these operations and make a mathematical “pun” 
by expressing the same quantity in two different ways. Namely, we have 


C& + ai )...(■& + Q m )f 


ai . . . a m F 


a i +1 , . . . , a m +l 
bi , . . . , b n 



Ever hear the one 
about the brothers 
who named their 
cattle ranch Focus, 
because it’s where 
the sons raise meat? 


and 


(•& + bi - 1 )...(•& + b n - 1 )F 

= (bi-1) 


(b n - 1 ) F 


ai, 

bi— 1, 


bn-1 


where F = F(ai , . . . , a m ; bi , . . . , b n ; z). And (5.106) tells us that the top line 
is the derivative of the bottom line. Therefore the general hypergeometric 
function F satisfies the differential equation 


D(d + bi — 1 )...(•& + b n — 1 )F = (ff + ai ) . . . (ff + a m )F , (5.107) 

where D is the operator 

This cries out for an example. Let’s find the differential equation satisfied 
by the standard 2 -over-l hypergeometric series F(z) = F(a, b; c; z). According 
to (5.107), we have 

D(ff + c — 1 )F = (ff + a)(ff + b)F. 


What does this mean in ordinary notation? Well, (•& + c — 1 )F is zF'(z) + 
(c — 1 )F(z), and the derivative of this gives the left-hand side, 

F'(z) + zF"(z) + (c — 1 )F'(z) . 
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The function 

F(z) = (1-z ) r 

satisfies 

9F = z(9-r)F. 
This gives another 
proof of the bino- 
mial theorem. 


On the right-hand side we have 

(9+a)(zF'(z) + bF(z)) = z-^-(zF'(z)+bF(z)) + q(zF'(z)+bF(z)) 

= zF , (z)+z 2 F ,, (z) + bzF , (z) + azF , (z) + abF(z) . 

Equating the two sides tells us that 

z(1 — z)F"(z) + (c — z(q + b + 1 ))F'(z] — abF(z) = 0. (5.108) 

This equation is equivalent to the factored form (5.107). 

Conversely, we can go back from the differential equation to the power 
series. Let’s assume that F(z) = JI k>0 t k z k is a power series satisfying (5.107). 
A straightforward calculation shows that we must have 

tk+i _ (k+a 1 )...(k+q m ) 

tk (k + bi) ... (k + b n )(k+ 1) ’ 

hence F(z) must be to F(qi , . . . , q m ; bi , . . . , b n ; z). We’ve proved that the 
hypergeometric series (5.76) is the only formal power series that satisfies the 
differential equation (5.107) and has the constant term 1. 

It would be nice if hypergeometrics solved all the world’s differential 
equations, but they don’t quite. The right-hand side of (5.107) always expands 
into a sum of terms of the form ctkZ k F (k) (z), where F (k) (z) is the kth derivative 
D k F(k); the left-hand side always expands into a sum of terms of the form 
(3 k z k_1 F (k) (z) with k > 0. So the differential equation (5.107) always takes 
the special form 

z n_1 (|3 n - za n )F (n) (z) -I F ((3i — zai )F'(z) — a 0 F(z) = 0. 

Equation (5.108) illustrates this in the case n = 2. Conversely, we will prove 
in exercise 6. 13 that any differential equation of this form can be factored in 
terms of the 9 operator, to give an equation like (5.107). So these are the dif- 
ferential equations whose solutions are power series with rational term ratios. 

Multiplying both sides of (5.107) by z dispenses with the D operator and 
gives us an instructive all-9 form, 

9(9 + b] — 1 ) . . . (9 + b n — 1 )F = z (9 + Qi ) ... (9 + a m )F . (5.109) 

The first factor 9 = (9 + 1 — 1 ) on the left corresponds to the (k + 1) in 
the term ratio (5.81), which corresponds to the k! in the denominator of the 
kth term in a general hypergeometric series. The other factors (9 + bj — 1 ) 
correspond to the denominator factor (k + bj), which corresponds to b k in 
(5.76). On the right, the z corresponds to z k , and (9 + q-j) corresponds to a k . 
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One use of this differential theory is to find and prove new transforma- 
tions. For example, we can readily verify that both of the hypergeometrics 


2a, 2b 

a+b + A 


and F 


a, b 

a+b+ A 


4z(1 — z] 


satisfy the differential equation 


z(1 — z)F"(z) + (a + b + A)(1 — 2z)F'(z) — 4abF(z) = 0; 


hence Gauss’s identity [143, equation 102] 


2a, 2b 
a+b + j 


= F 


a, b 
a+b + 


4z(l — z) 


must be true. In particular, 


2a, 2b 
a+b + j 


= F 


a, b 

a+b + 2 


(5-no) 


(5-m) 


whenever both infinite sums converge. 

Every new identity for hypergeometrics has consequences for binomial 
coefficients, and this one is no exception. Let’s consider the sum 



integers m 7 n ^ 0. 


The terms are nonzero for 0 ^ k ^ m - u, and with a little delicate limit- 
taking as before we can express this sum as the hypergeometric 


lim 

e— ► () 



/ n— m, — n— m— 1+ae 
V — m+e 


The value of oc doesn’t affect the limit, since the nonpositive upper parameter 
n — m cuts the sum off early. We can set oc = 2, so that (5.111) applies. 
The limit can now be evaluated because the right-hand side is a special case 
of (5.92). The result can be expressed in simplified form, 




[m + n is even] , 


integers 
m 7 n ^ 0, 


(5.112) 


as shown in exercise 54. For example, when m = 5 and n = 2 we get 

(!) (0) - (2) (?)/2 + (!) (!)/4 - (!) (I)/8 = 10-24 + 21 -7 = 0; when m = 4 
and n = 2, both sides give | . 


(Caution: We can 
use ( 5 . 110 ) safely 
when \z\ > 1/2, 
unless both sides 
are polynomials; 
see exercise 53.) 
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We can also find cases where (5.110) gives binomial sums when z = — 1 , 
but these are really weird. If we set a = | j and b = — n, we get the 
monstrous formula 


F 


1-fn, — 2n 


in 


-1 



These hypergeometrics are nondegenerate polynomials when n ^ 2 (mod 3); 
and the parameters have been cleverly chosen so that the left-hand side can 
be evaluated by (5.94). We are therefore led to a truly mind-boggling result, 





integer n 5? 0, 


n ^ 2 (mod 3). 


(5-113) 


The only use of 
(5.113) is to demon- 
strate the existence 
of incredibly useless 
identities. 


This is the most startling identity in binomial coefficients that we’ve seen. 
Small cases of the identity aren’t even easy to check by hand. (It turns out 
that both sides do give y- when n = 3.) But the identity is completely 
useless, of course; surely it will never arise in a practical problem. 

So that’s our hype for hypergeometrics. We’ve seen that hypergeometric 
series provide a high-level way to understand what’s going on in binomial 
coefficient sums. A great deal of additional information can be found in the 
classic book by Bailey [18] and its sequel by Gasper and Rahman [141]. 


5.7 PARTIAL HYPERGEOMETRIC SUMS 

Most of the sums we’ve evaluated in this chapter range over all in- 
dices k 0, but sometimes we’ve been able to find a closed form that works 
over a general range a ^ k < b. For example, we know from (5.16) that 

X , integer m. (5.114) 

k<m ' ' ' ' 

The theory in Chapter 2 gives us a nice way to understand formulas like this: 
If f (k) = Ag(k) = g(k+ 1) — g ( k) , then we’ve agreed to write ]Tf(k)5k = 
g(k) + C, and 

Y_ f(k)5k = g(k)|* = g (b ) g ( a) . 

Furthermore, when a and b are integers with a ^ b, we have 


y f(k)6k = V f(k) = g (b ) — g(a) . 

z — a z — 

a^kcb 
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Therefore identity (5.114) corresponds to the indefinite summation formula 


z 


— 1 ) k 6k = (-1 


vk-l 


and to the difference formula 


A (-T 


= (-1 


ik+1 


n - 1 
k- 1 


n+1 
k+ 1 


C, 


It’s easy to start with a function g(k) and to compute Ag(k) = f(k), a 
function whose sum will be g(k) + C. But it’s much harder to start with f(k) 
and to figure out its indefinite sum ]Tf(k)6k = g(k) + C; this function g 
might not have a simple form. For example, there is apparently no simple 
form for ]T (£) 6k; otherwise we could evaluate sums like ^Ik<n/3 (k)> a k° u t 
which we’re clueless. Yet maybe there is a simple form for Y_ (£) 6k and we 
just haven’t thought of it; how can we be sine? 

In 1977 , R. W. Gosper [ 154 ] discovered a beautiful way to find indefinite 
sums f(k) 6k = g(k) + C whenever f and g belong to a general class of 
functions called hypergeometric terms. Let us write 


( ai , • • • , a m \ = a\ . . . a]^ z* 

V b i> ..., b n 7k bf...b£ k! 


( 5 -ii 5 ) 


for the kth term of the hypergeometric series F(ai , . . . , a m ; bi , . . . , b n ; z). We 
will regard F(ai , . . . , a m ; bj , . . . , b n ; z)^ as a function of k, not of z. In many 
cases it turns out that there are parameters c, Ai , ..., Am, Bi, ..., Bn, 
and Z such that 



zj 5 k = c F 


/ Ai , . . . , Am 
\ Bi , . . . , B n 


i+ c - 


(5.116) 


given ai , ..., a m , bi, ..., b n , and z. We will say that a given function 
F(ai , . . . , a m ; b] , . . . , b n ; z)^ is summable in hypergeometric terms if such 
constants c, A], ..., Am, Bi, ..., Bn, Z exist. Gosper’s algorithm either 
finds the unknown constants or proves that no such constants exist. 

In general, we say that t(k) is a hypergeometric term if t(k + 1 )/t(k) is a 
rational function of k, not identically zero. This means, in essence, that t(k) 
is a constant multiple of a term like (5.115). (A technicality arises, however, 
with respect to zeros, because we want t(k) to be meaningful when k is neg- 
ative and when one or more of the b’s in (5.115) is zero or a negative integer. 
Strictly speaking, we get the most general hypergeometric term by multiply- 
ing (5.115) by a nonzero constant times a power of 0, then cancelling zeros 
of the numerator with zeros of the denominator. The examples in exercise 12 
help clarify this general rule.) 
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(Divisibility of poly- 
nomials is analogous 
to divisibility of 
integers. For exam- 
ple, (k + a)\q(k) 
means that the quo- 
tient q(k)/(k + a) 
is a polynomial. 

It’s easy to see that 
(k + a)\q(k) 
if and only if 

q(-a) = o.j 


(Exercise 55 gives 
a clue about why 
we might want to 
make this magic 
substitution.) 


Suppose we want to find ]T tfk) 6k, when tfk) is a hypergeometric term. 
Gosper’s algorithm proceeds in two steps, each of which is fairly straightfor- 
ward. Step 1 is to express the term ratio in the special form 

tfk + 1 ) = p(k+ 1) q(k) 
t(k) “ p(k) r(k + 1 ) ’ 

where p, q, and r are polynomials subject to the following condition: 

(k+a)\q(k) and (k + | 3 )\r(k) 

=$■ a — (3 is not a positive integer. (5.118) 

This condition is easy to achieve: We start by provisionally setting p(k) = 1 , 
and we set q(k) and r(k + 1 ) to the numerator and denominator of the term 
ratio, factoring them into linear factors. For example, if t(k) has the form 
(5.115), we start with the factorizations q(k) = (k + ai ) . . . (k + a m )z and 
rfk) = (k + bi — 1 ) . . . (k + b n — 1 )k. Then we check if (5.118) is violated. If q 
and r have factors (k + a) and (k + ( 3 ) where a— ( 3 =N> 0 , we divide them 
out of q and r and replace p(k) by 

p(k)(k+a— 1)^^- = p(k)(k-hx— 1)(k+a— 2) ... (k+p+l) . (5-119) 

The new p, q, and r still satisfy (5.117), and we can repeat this process until 
(5.118) holds. We’ll see in a moment why (5.118) is important. 

Step 2 of Gosper’s algorithm is to finish the job — to find a hypergeo- 
metric term Tfk) such that 


tfk) = Tfk + 1 ) — Tfk) , 


(5.120) 


whenever possible. But it’s not obvious how to do this; we need to develop 
some theory before we know how to proceed. Gosper noticed, after studying 
a lot of special cases, that it is wise to write the unknown function Tfk) in 
the form 


Tfk) 


rfk)s(k)tfk) 

P(lc) 


(5.121) 


where sfk) is a secret function that must be discovered somehow. Plugging 
(5.121) into (5.120) and applying (5.117) gives 

= r(k + 1 ) sfk + 1 )t(k + 1) tfk) sfk) tfk) 

p(k+l) pfk) 

q(k)s(k+ l)t(k) r(k)sfk)tfk) 

= P(kj pfM ; 



226 BINOMIAL COEFFICIENTS 


so we need to have 

p(k) = q(k)s(k+ 1) — r(k)s(k) . (5-122) 

If we can find s(k) satisfying this fundamental recurrence relation, we’ve found 
Y_ t(k) 6k. If we can’t, there’s no T. 

We’re assuming that T(k) is a hypergeometric term, which means that 
T(k+ 1 )/T(k) is a rational function of k. Therefore, by (5.121) and (5.120), 
r(k)s(k)/p(k) = T(k)/(T(k+ 1 ) — T(k)) is a rational function of k, and s(k) 
itself must be a quotient of polynomials: 

s(k) = f(k)/g(k) . 

But in fact we can prove that s(k) is itself a polynomial. For if g(k) is not 
constant, and if f(k) and g ( k) have no common factors, let N be the largest 
integer such that (k + ( 3 ) and (k + (3 + N — 1 ) both occur as factors of g(k) 
for some complex number | 3 . The value of N is positive, since N = 1 always 
satisfies this condition. Equation (5.122) can be rewritten 

p(k)g(k+l)g(k) = q(k)f(k+l)g(k) — r(k)g(k+1)f(k) , 

and if we set k = — 13 and k = — 13 — N we get 

T ( — | 3 )g(l — ( 3 )f( — ( 3 ) = 0 = q(— ( 3 — N)f( 1 — ( 3 — N)g(— ( 3 — N) . 

Now f ( — ( 3 ) 7b 0 and f ( 1 — (3 — M ) 7^ 0 , because f and g have no common 
roots. Also g(l — | 3 ) 7^ 0 and g (— 13 — N) 7^ 0 , because g(k) would otherwise 
contain the factor ( k + |3 — 1 ) or (k +|3 + N), contrary to the maximality of N . 
Therefore 

r( — (3) = q ( — 13 — N) = 0. 

But this contradicts condition (5.118). Hence s(k) must be a polynomial. 

Our task now boils down to finding a polynomial s(k) that satisfies 
(5.122), when p(k), q(k), and r (k) are given polynomials, or proving that 
no such polynomial exists. It’s easy to do this when s(k) has any particular 
degree d, since we can write 

s(k) = a d k d + a d _ 1 k d ~ 1 H b a 0 , a d 7^ 0 (5- 12 3) 

for unknown coefficients (a d , . . . , cxo) and plug this expression into the fun- 
damental recurrence (5.122). The polynomial s(k) will satisfy the recurrence 
if and only if the a’s satisfy the linear equations that result when we equate 
coefficients of each power of k in (5.122). 


I see: Gosper came 
up with condition 
(5.118) in order to 
make this proof go 
through. 
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Why isn ’t it 
r(k) = k+1? 
Oil, I see. 


But how can we determine the degree of s? It turns out that there 
actually are at most two possibilities. We can rewrite (5.122) in the form 

2 p(k) = Q(k)(s(k + 1 ) + s(k)) +R(k)(s(k+ 1 ) - s(k]) , (5-124) 

where Q(k) = q(k) — r(k) and R(k) = q(k)+r(k). 

If s (k) has degree d, then the sum s(k + 1 ) + s(k) = 2 a d k d + ■ • • also has 
degree d, while the difference s(k + 1 ) — s(k) = As(k) = doCdk d_1 + • • • has 
degree d — 1 . (The zero polynomial can be assumed to have degree — 1 .) Let’s 
write deg(P) for the degree of a polynomial P. If deg(Q) deg(R), then 
the degree of the right-hand side of (5.124) is deg(Q) + d, so we must have 
d = deg(p) — deg(Q). On the other hand if deg(Q) < deg(R) = d', we can 
write Q(k) = A'k d +■ • • and R(k) = Ak d + • • • where A 7^ 0 ; the right-hand 
side of (5.124) has the form 

( 2 A'a d +Ada d )k d+d '- 1 +■■■ . 


Ergo, two possibilities: Either 2 A' + Ad 0 , and d = deg(p) — deg(R) + 1 ; 
or 2 A' + Ad = 0 , and d > deg(p) — deg(R) + 1 . The second case needs to be 
examined only if — 2 A'/A is an integer d greater than deg(p) — deg(R) + 1 . 

OK, we now have enough facts to perform Step 2 of Gosper’s two-step 
algorithm: By trying at most two values of d, we can discover s(k), whenever 
equation (5.122) has a polynomial solution. If s(k) exists, we can plug it 
into (5.121) and we have our T. If it doesn’t, we’ve proved that t(k) is not 
summable in hypergeometric terms. 

Time for an example: Let’s try the partial sum (5.114). Gosper’s method 
should be able to deduce the value of 



(— 1 ) k 6k 


for any fixed n, so we seek the sum of 


t(k) 



n! (-1) k 
k! (n-k)! ' 


Step 1 is to put the term ratio into the required form (5.117); we have 

t(k+l) _ k — n _ p(k+l)q(k) 
t(k) “ k+1 “ p(k) r(k + 1 ) 


so we simply take p ( k) = 1 , q(k) = k — n, and r(k) = k. This choice of p, q, 
and r satisfies (5.118), unless n is a negative integer; let’s suppose it isn’t. 

Now we do Step 2 . According to (5.124), we should consider the poly- 
nomials Q(k) = — n and Rfk) = 2 k — n. Since R has larger degree than Q, 
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we need to look at two cases. Either d = deg(p) — deg(R) + 1 , which is 0 ; or 
d = — 2 A'/A where A' = — n and A = 2 , hence d = n. The first case is nicer, 
because it doesn’t require n to be a positive integer, so let’s try it first; we’ll 
need to try the other possibility for d only if the first case fails. Assuming 
that d = 0, the value of s(k) is simply <xo, and equation (5.122) reduces to 

1 = (k — n)a 0 — ka 0 . 

Hence we choose ao = — 1 /n. This satisfies the equation and gives 


T(k) = 


r(k) s(k) t(k) 
p(k) 


k- [ — 
n 


n — 1 
k- 1 


(-1 


n 
k 

,k-i 


(- 1 ’ 


if n ^ 0, 


precisely the answer we were hoping to confirm. 

If we apply the same method to find the indefinite sum ]T (£) 6k, without 
the (— 1 ) k , everything will be almost the same except that q(k) will be n — k; 
hence Q(k) = n — 2 k will have greater degree than R(k) = n, and we will 
conclude that d has the impossible value deg(p) — deg(Q) = — 1 . (The polyno- 
mial s(k) cannot have negative degree, because it cannot be zero.) Therefore 
the function (£) is not summable in hypergeometric terms. 

However, once we have eliminated the impossible, whatever remains — 
however improbable — must be the truth (according to S. Holmes [ 83 ]). When 
we defined p, q, and r in Step 1 , we decided to ignore the possibility that n 
might be a negative integer. What if it is? Let’s set n = — N, where N is 
positive. Then the term ratio for )T (k) 6k is 

t(k+ 1 ) _ -(k+N) _ p(k+l) q(k) 
t(k) “ (k+1) “ p(k) r(k+l) 


and it should be represented by p(k) = (k + 1) N ~ 1 , q ( k) = —1, r(k) = 1, 
according to (5.119). Step 2 of Gosper’s algorithm now tells us to look for 
a polynomial s(k) of degree d = N — 1 ; maybe there’s hope after all. For 
example, when N = 2 recurrence (5.122) says that we should solve 

k+1 = — ((k+ 1)<xi + ao) — (kai + ao) . 

Equating coefficients of k and 1 tells us that 


1 = — cti — ai ; 


1 = -a 1 - ao - a 0 ; 
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“Excellent, Holmes!" 
“Elementary, my 
dear Watson.” 


hence s(k) = — ik — 1 is a solution, and 


T(k) = 


2 4 


= (-i 


A-1 


2k + 1 


k+1 ' 4 

Can this be the desired sum? Yes, it checks out: 


k 2k + 3 k _i 


2k + 1 


= (- 1 ) k (k+ 1 ) = 


-2 

k 


Incidentally, we can write this summation formula in another form, by 
attaching an upper limit: 



(-1 


Pc-1 2k + 1 


m 

0 


— 1 / 1 — ( — 1 ) m 
mH 


2 

(— l ) m_1 


m 

T 


integer m 0 . 


This representation conceals the fact that ( k 2 ) is summable in hypergeometric 
terms, because [m/2] is not a hypergeometric term. (See exercise 12.) 

A problem might arise in the denominator of ( 5 . 121 ) if p(k) = 0 for 
some integer k. Exercise 97 gives some insight into what can be done in such 
situations. 

Notice that we need not bother to compile a catalog of indefinitely 
summable hypergeometric terms, analogous to the database of definite hyper- 
geometric sums mentioned earlier in this chapter, because Gosper’s algorithm 
provides a quick, uniform method that works in all summable cases. 

Marko Petkovsek [291] has found a nice way to generalize Gosper’s algo- 
rithm to more complicated inversion problems, by showing how to determine 
all hypergeometric terms T(k) that satisfy the Ith-order recurrence 


t(k) = p v (k)T(k + l) + --'+pi(k)T(k+1)+po(k)T(k), (5-125) 

given any hypergeometric term t (k) and polynomials pi(k), . . . , pi (k), po(k). 


5.8 MECHANICAL SUMMATION 

Gosper’s algorithm, beautiful as it is, finds a closed form for only a 
few of the binomial sums we meet in practice. But we need not stop there. 
Doron Zeilberger [383] showed how to extend Gosper’s algorithm so that it 
becomes even more beautiful, making it succeed in vastly more cases. With 
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Zeilberger’s extension we can handle summation over all k, not just partial 
sums, so we have an alternative to the hypergeometric methods of Sections 
5.5 and 5.6. Moreover, as with Gosper’s original method, the calculations can 
be done by computer, almost blindly; we need not rely on cleverness and luck. 

The basic idea is to regard the term we want to sum as a function t(n, k) 
of two variables n and k. (In Gosper’s algorithm we wrote just t(k).) When 
t(n, k) does not turn out to be indefinitely summable in hypergeometric terms, 
with respect to k — and let’s face it, relatively few functions are — Zeilberger 
noticed that we can often modify t(n, k) in order to obtain another term that 
is indefinitely summable. For example, it often turns out in practice that 
(3o(n.)t(n, k) + (3i (n)t(n+ 1 , k) is indefinitely summable with respect to k, for 
appropriate polynomials |3o(n) and |3j (n). And when we carry out the sum 
with respect to k, we obtain a recurrence in n that solves our problem. 

Let’s start with a simple case in order to get familiar with this general 
approach. Suppose we have forgotten the binomial theorem, and we want to 
evaluate ( k )z k . How could we discover the answer, without clairvoyance 
or inspired guesswork? Earlier in this chapter, for example in Problem 3 of 
Section 5.2, we learned how to replace (£) by ( n k 1 ) + ( k \ ) and to fiddle 
around with the result. But there’s a more systematic way to proceed. 

Let t(n, k) = ( k )z k be the quantity we want to sum. Gosper’s algorithm 
tells us that we can’t evaluate the partial sums jT k<m t( n > k) for arbitrary n 
in hypergeometric terms, except in the case z = — 1 . So let’s consider a more 
general term 

t(n,k) = Po(n-)t(n,k) + (3i(n)t(n + 1,k) (5-126) 

instead. We’ll look for values of |3o(n) and (3 1 (n) that make Gosper’s algo- 
rithm succeed. First we want to simplify (5.126) by using the relation between 
t(n + 1 , k) and t(n, k) to eliminate t(n + 1 , k) from the expression. Since 

t(n+1,k) _ (n + 1 )! z k (n-k)!k! 
t(n, k) (n+l— k)!k! n!z k 

n + 1 

n+ 1 -k ’ 


we have 

t(n,k) = p(n,k) , 

n + 1 — k 


where 


Or without looking 
on page 1 74. 


p(n,k) = (n+1 -k)|3 0 (n) + (n + l)|3i(n). 
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This time I remem- 
bered why r(n, k) 
isn ’t k + 1 . 


We now apply Gosper’s algorithm to t(n, k), with n held fixed, first writing 

tfn,k+ 1) = p(n, k+1) qfn,k) . 

t(n,k) p(n,k) r(n,k+l) 5 ' 12? 

as in (5.117). Gosper’s method would find such a representation by starting 
with p(n,k) = 1 , but with Zeilberger’s extension we are better off starting 
with p(n, k) = p(n,k). Notice that if we set t(n,k) = t(n,k)/p(n, k) and 
p(n, k) = p(n, k)/p(n, k), equation (5.127) is equivalent to 

t(n, k + 1 ) = p(n, k+1) q(n,k) 
t(n, k) p(n,k) r(n, k+1)' 5-12 

So we can find p, q and r satisfying (5.127) by finding p, q and r satisfying 
(5.128), starting with p(n,k) = 1 . This makes life easy, because t(n,k) does 
not involve the unknown quantities | 3 o(tl) and | 3 i (n) that appear in t(n, k). 
In our case t(n, k) = t(n, k)/(n + 1 — k) = n! z k / (n + 1 — k) ! k!, so we have 

t(n, k + 1 ) (n + 1 — k)z 
t(n, k) k+1 


we may take q(n, k) = (n + 1 — k)z and r(n,k) = k. These polynomials 
in k are supposed to satisfy condition (5.118). If they don’t, we’re supposed 
to remove factors from q and r and include corresponding factors (5.119) in 
p(n, k); but we should do this only when the quantity a — |3 in (5.118) is a 
positive integer constant, independent of n, because we want our calculations 
to be valid for arbitrary n. (The formulas we derive will, in fact, be valid 
even when n and k are not integers, using the generalized factorials (5.83).) 

Our first choices of q and r do satisfy (5.118), in this sense, so we can 
move right on to Step 2 of Gosper’s algorithm: We want to solve the analog 
of (5.122), using (5.127) in place of (5.117). So we want to solve 

p(n,k) = q(n,k)s(n,k+ 1) - r(n,k)s(n,k) (5-129) 

for the secret polynomial 

s(n, k) = a d (n)k d + a d _i (n)k d_1 H ba 0 (n). (5-130) 

(The coefficients of s are considered to be functions of n, not just constants.) 
In our case equation (5.129) is 

(n+ 1 -k)p 0 (n) + (n + IjMn) 

= (n + 1 — k)zs(n, k + 1 ) — ks(n, k) , 

and we regard this as a polynomial equation in k with coefficients that are 
functions of n. As before, we determine the degree d of s by considering 
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Q(n, k) = q(n, k) — r(n, k) and R(n,k) = q(n,k) + r(n, k). Since deg(Q) = The degree function 
deg(R) = 1 (assuming that z ^ — 1 ), we have d = deg(p) — deg(Q) = 0 and deg(Q) refers 
s(n,k) = ao(n) is independent of k. Our equation becomes ^k^treating’^as 

constant. 

(n + 1 — k)0o(n) + (n + 1 )0i (n) = (n + 1 - k)za 0 (n) - ka 0 (n) ; 

and by equating powers of k we get the equivalent k-free equations 

(n + 1)0o(n) + (n + 1)0i (n) — (n + 1)za 0 (n)= 0 , 

-0o(n) + (z+ 1)a 0 (n)= 0. 

Hence we have a solution to (5.129) with 

0o(n) = z+1, 0i(n) = -1, cx 0 (n) = s(n,k) = 1. 

(By chance, n has dropped out.) 

We have discovered, by a purely mechanical method, that the term 
t(n, k) = (z +1 )t(n, k) — t(n + 1 , k) is summable in hypergeometric terms. In 
other words. 


t(n, k) = T(n, k+ 1 ) — T(n, k) 


(5-i3i) 


where T(n, k) is a hypergeometric term in k. What is this T(n, k)? According 
to (5.121) and (5.128), we have 

__ , , , r(n,k)s(n, k)t(n, k) . , 

T(n,k) = ^TkO = r ( n ’ k ) s ( n ’ k ) t ( n ’ k ) ’ (5-132) 

because p(n,k) = 1 . (Indeed, p(n, k) almost always turns out to be 1 in 
practice.) Hence 


T(n, k ) = — — y — r t(n, k) = — ^ — -f^ 
n+1— k n + 1— k\k 


And sure enough, everything checks out — equation (5.131) is true: 


(z+1) 


z k+1 - 


But we don’t actually need to know T(n,k) precisely, because we are 
going to sum t(n, k) over all integers k. All we need to know is that T(n, k) is 
nonzero for only finitely many values of k, when n is any given nonnegative 
integer. Then the sum of T(n, k + 1 ) — T(n, k) over all k must telescope to 0 . 

Let S n = XLk^( n > k ) = 21 k (k) z^i this is the sum we started with, and 
we’re now ready to compute it, because we now know a lot about t(n, k). The 
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In fact, 

lim T(n, k) = 0 

k — >oo 

when \z\ < 1 
and n is any 
complex number. 
So (5133) « true 
for all n, 
and in particular 
Sn = (Z+ IT 
when n is a nega- 
tive integer. 


Gosper-Zeilberger procedure has deduced that 
^2((z+ 1)t(n,k) -t(n + 1,k)) = 0. 

k 

But this sum is (z + 1 ) )T k t(n, k) — ^ k t(n + 1 , k) = (z + 1 )S n — S n+ i . 
Therefore we have 

S n +i = (z+1)S n . (5-133) 

Aha! This is a recurrence we know how to solve, provided that we know So- 
And obviously So = 1. Hence we deduce that Sn. = (z + 1 ) n , for all integers 
n^O. QED. 

Let’s look back at this computation and summarize what we did, in a 
form that will apply also to other summands t(n,k). The Gosper-Zeilberger 
algorithm can be formalated as follows, when t(n,k) is given: 

0 Set l := 0. (We’ll seek recurrences in n of order l.) 

1 Let t(n,k) = (3 0 (n)t(n,k)H b|3 l (n)t(n-|-l,k), where |3 0 (n), . . . , (3 t (n) 

are unknown functions. Use properties of t(n, k) to find a linear combi- 
nation p(n, k) of |3o(tl), . . . , Pi(n) with coefficients that are polynomials 
in n and k, so that t(n,k) can be written in the form p(n, k)t(n, k), 
where t(n, k) is a hypergeometric term in k. Find polynomials p(n, k), 
q(n,k), r(n, k) so that the term ratio of t(n, k) is expressed in the form 
(5.128), where q(n, k) and r(n, k) satisfy Gosper’s condition (5.118). Set 
p(n,k) = p(n,k)p(n,k). 

2a Set dQ := deg(q — r), d R := deg(q + r), and 

, = f deg(p) — d Q , ifd Q ^d R ; 

\ deg (p ) — d R + 1 , if d Q < d R . 

2b If d ij 0, define s(n,k) by (5.130), and consider the linear equations in 
<Xo, . . . , ctdi Po, • • • Pi obtained by equating coefficients of powers of k 
in the fundamental equation (5.129). If these equations have a solution 
with |3o, . . . , Pi not all zero, go to Step 4. Otherwise, if dQ < d R and 
if — 2A'/A is an integer greater than d, where A is the coefficient of k dR 
in q + r and A' is the coefficient of k dR_1 in q — r, set d := — 2 A'/A and 
repeat Step 2b. 

3 (The term t(n, k) isn’t hypergeometrically summable.) Increase l by 1 
and go back to Step 1. 

4 (Success.) Set T(n,k) := r(n, k)s(n,k)t(n, k)/p(n, k). The algorithm 
has discovered that t(n, k) = T(n,k+ 1) — T(n,k). 

We’ll prove later that this algorithm terminates successfully whenever t(n, k) 
belongs to a large class of terms called proper terms. 
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The binomial theorem can be derived in many ways, so our first example 
of the Gosper- Zeilberger approach was more instructive than impressive. Let’s 
tackle Vandermonde’s convolution next. Can Gosper and Zeilberger deduce 
algorithmically that ]T k ( k ) ( n b k ) has a simple form? The algorithm starts 
with 1 = 0, which essentially reproduces Gosper’s original algorithm, trying 
to see if (“) ( n b k ) is summable in hypergeometric terms. Surprise: That term 
actually does turn out to be summable, if a+b is a specific nonnegative integer 
(see exercise 94). But we are interested in general values of a and b, and the 
algorithm quickly discovers that the indefinite sum is not a hypergeometric 
term in general. So l is increased from 0 to 1 , and the algorithm proceeds to 
try t(n, k) = |3o(n.)t(n, k) + |3 1 (ri)t(n. + 1,k) instead. The next step, as in 
our derivation of the binomial theorem, is to write t(n, k) = p(n, k) t(n, k), 
where p(n, k) is obtained by clearing fractions in t(n + 1 , k)/t(n, k). In this 
case — the reader should please work along on a piece of scratch paper to 
check all these calculations — they aren’t as hard as they look — everything 
goes through in an analogous fashion, but now with 

p(n,k) = (n + 1 -k)|3 0 (n) + (b-n + k)(3i(n) = p(n,k), 
t(n,k) = t(n, k)/(n+1— k) = a! b!/(a— k)! k! (b— n+k)! (n+1— k)! , 
q(n,k) = (n + 1 — k)(a — k) , 
r(n,k) = (b — n + k)k. 


Step 2a finds deg(q — r) < degfq + r), and d = deg(p) — deg(q + r) + 1 = 0, 
so s(n, k) is again independent of k. Gosper’s fundamental equation (5.129) 
is equivalent to two equations in three unknowns, 

(n + l)|3o(n) + (b -n)|3i (n) - (n + l)aa 0 (n)= 0 , 

-|3 0 (n) +|3i (n) + (a + b + l)a 0 (n)= 0 , 

which have the solution 

|3 0 (n) = a + b-n, |3i(n) = — n— 1, a 0 (n) = 1. 

We conclude that (a + b — n)t(n, k) — (n + 1 )t(n + 1 , k) is summable with 
respect to k; hence if S n = )T k ( k )( n b k ) the recurrence 


The crucial point 
is that the Gosper- 
Zeiiberger method 
always leads to 
equations that 
are linear in the 
unknown a ’s and 
(3 ’s, because the 
left side of ( 5 . 129 ) 
is linear in the (3 ’s 
and the right side is 
linear in the a. ’s. 


■>n+l = 


b — n 


holds; thus S n = ( a (, b ) since So = 1. A piece of cake. 

What about the Saalschutzian triple-binomial identity in (5.28)? The 
proof of (5.28) in exercise 43 is interesting, but it requires inspiration. When 
we transform an art into a science, we aim to replace inspiration by perspi- 
ration; so let’s see if the Gosper-Zeilberger approach to summation is able to 
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Deciding what pa- 
rameter to call n 
is the only non- 
mechanical part. 


discover and prove (5.28) in a purely mechanical way. For convenience we 
mate the substitutions m = b + d, n=a, r=a + b + c + d, s = a + b + c, 
so that (5.28) takes the more symmetrical form 

r— (a + b + c+ d + k)! 

(a — k)!(b — k)!(c + k)!(d + k)!k! 

(a + b + c + d)! (a + b + c)! (a + b + d)! . . 

= a!b!(a + c)!(a + d)!(b + c)!(b + d)! ' ( 5 -i 34 j 


To make the sum finite, we assume that either a or b is a nonnegative integer. 

Let t(n,k) = (n + b + c + d + k)!/(n — k)! (b — k)! (c + k)! (d + k) ! k! 
and t(n, k) = 0 o(n)t(n, k) + ( 3 1 (n)t(n + 1 , k). Proceeding along a path that 
is beginning to become well worn, we set 


p(n,k) = (n + 1 — k)| 3 0 (n) + (n + 1 + b + c + d + k) 0 i (n) = p(n,k), 
- . _ t(n, k) _ (n + b + c + d + k)! 

lU ’ 1 “ u+ 1 -k “ (n+1 -k)!(b-k)!(c + k)!(d + k)!k! ’ 
q(n,k) = (n + b + c + d + k+ l)(n+l — k)(b — k) , 
r(n,k) = (c + k)(d + k)k, 


Notice that A' is 
not the leading 
coefficient of Q , 
although A is the 
leading coefficient 
of R . The num- 
ber A' is the coeffi- 
cient of R de g ,R| - 1 
in Q . 


and we try to solve (5.129) for s(n, k). Again deg(q — r) < deg(q +r), but this 
time deg(p) — deg(q+r) + l = —1 so it looks like we’re stuck. However, Step 2 b 
has an important second choice, d = — 2 A'/A, for the degree of s; we had better 
try it now before we give up. Here R(n,k) = q(n,k) +r(n,k) = 2 k 3 + • • • , so 
A = 2 , while the polynomial Q(n,k) = q(n, k) — r(n,k) almost miraculously 
turns out to have degree 1 in k — the coefficient of k 2 vanishes! Therefore 
A' = 0 ; Gosper allows us to take d = 0 and s(n, k) = oco(ti). 

The equations to be solved are now 

(n + 1 )|3 o(ti) + (n + 1 + b + c + d)0i(n) 

- (n + 1 )(n + 1 + b + c + d)boto(n) = 0, 

- 0 o (n) + 0i (n) 

- ((n + 1 )b - (n + 1 + b)(n + 1 + b + c + d) - cd)a 0 (n) = 0; 


and we find 


Po (a) — (n+1 + b + c)(n+l + b + d)(n + 1 + b + c + d), 

0i (n) = — (n+l)(n+l+c)(n+l+d), 
cxo(n) = 2n + 2 + b + c + d, 

Perspiration flows, after only a modest amount of perspiration. The identity (5.134) follows 
identity follows. immediately. 

A similar proof of (5.134) can be obtained if we work with n = d instead 
of n = a. (See exercise 99 .) 
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The Gosper-Zeilberger approach helps us evaluate definite sums over a 
restricted range as well as sums over all k. For example, let’s consider 

S n (z) = il( n v k ) zk - (5-135) 

k=0 ' ' 

When z = \ we obtained an “unexpected” result in (5.20); would Gosper and 
Zeilberger have expected it? Putting t(n, k) = ( n j( k )z k leads us to 

p(u, k) = (n + 1)|3o(n)#(n + 1 -f-kjp^n) =p(n,k), 
t(n,k) = t(n, k)/(n + 1 ) = (n + k)! z k /k! (n + 1 )! , 
q(n,k) = (n + 1 +k)z, 
r(n,k) = k, 

anddeg(s) = deg(p)— deg(q— r) =0. Equation (5.129) is solved by (3o(n) =1, 
|3] (n) = z — 1 , s(n, k) = 1 . Therefore we find 


t(n,k) + (z- 1)t(n+ l,k) = T(n, k + 1 ) - T(n, k) , (5-136) 

where T(n, k) = r(n, k)s(n, k)t(n, k)/p(n,k) = ())^ k )z k . We can now sum 
(5. 136) for 0 ^ k ^ n + 1 , getting 


S n (z) +t(n,n+ 1) + (z- 1)S n+ i(z) = T(n,n + 2) -T(n,0) 


2n + 2 
n+1 


r n+2 


= 2^ r )z n+2 


But t(n, n+ 1) = ( 2 n n + + 1 1 )^+ 1 = ( 2 n n + V+\ 


SO 


S n+ i(z) = T ^fs n (z) + (1-2z){ 2n ^ 1 )z n+1 


(5-137) 


We see immediately that the case z = \ is special, and that S n +i(j) = 
2S n (2;)- Moreover, the recurrence (5.137) can be simplified by applying the 
summation factor (1 — z) n+1 to both sides; this yields the general identity 


(1 



1 — 2z 

2 — 2z 




(5-138) 


which comparatively few people would have expected before Gosper and Zeil- 
berger came along. Now the production of such identities is routine. 
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How about the similar sum 

S n (z) = ±(\ k V, ( 5 - 139 ) 

k=0 ' ' 

which we encountered in (5.74)? Flushed with confidence, we set t(n,k) = 
( n - k )z k and proceed to calculate 

p(n,k) = (n + 1 — 2k) ( 3 0 (n) + (n + 1 — k) ( 3 1 (n) = p(n,k), 
t(n,k) = t(n,k)/(n+ 1 -2k) = (n - k)! z k /k! (n + 1 - 2k)! , 
q(n,k) = (n+ 1 — 2k)(n — 2k)z, 
r(n,k) = (n+ 1 — k)k. 

S n (— j) equals 
(n + l)/2 n . 


p(n, k) = (n+ 1 — 2 k)(n + 2 — 2 k)| 3 0 (n) 

+ (n + 1 — k)(n-p 2 - 2 k)( 3 i (n) 

+ (n + 1 - k)(n + 2 - kJPzfn) = p(n,k), 
t(n,k) = t(n, k)/(n+l — 2k) (n+2— 2k) = (n-k)! z k /k! (n+2-2k)! , 
q(n,k) = (n + 2 — 2k)(n + 1 — 2k)z, 
r(u, k) = (n + 1 — k)k . 

Now we can try s(n, k) = ao(n) and (5.129) does have a solution: 

| 3 0 (n) = z, ( 3 t (n) = 1 , | 3 2 (n) = - 1 , a 0 (n) = 1 . 

We have discovered that 

zt(n,k) +t(n+ 1 ,k) — t(n + 2 ,k) = T(n, k + 1 ) - T(n, k) , 

where T(n, k) equals r(n, k)s(n, k)t(n, k)/p(n, k) = (n + 1 — k)kt(n, k) = 
( n j(lY k )z k . Summing from k = 0 to k = n gives 

Z Sn (z) + (S n +1 (Z) - ( n + 1 )z n+1 ) - (Sn + 2(Z) - ( n ° 2 )z~+ 2 C^ 1 ) 

= T(n,n + 1 ) — T(n, 0 ) . 

And ( n ^ 1 ) = (°)z n+1 = T(n, n + 1 ) for all n ^ 0 , so we obtain 

Sn+2 (z) = S n+ i (z) + zS n (z) , n^O. (5-140) 


But whoa — there’s no way to solve (5.129), if we assume that z^-|, because 
the degree of s would have to be deg(p) — deg(q — r) = — 1 . 

No problem. We simply add another parameter ( 3 ^ (tl) and try t(n, k) = 
So(n)t(n, k) + 61 (n)t(n + 1 , k) + ( 3 2 (n)t(ri + 2 , k) instead: 
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We will study the solution of such recurrences in Chapters 6 and 7 ; the meth- 
ods of those chapters lead directly from (5.140) to the closed form (5.74), 
when Sq(z) = Si (z) = 1 . 

One more example — a famous one — will complete the picture. The 
French mathematician Roger Apery solved a long-standing problem in 1978 
when he proved that the number £(3) = 1 + 2 ~ 3 + 3 ~ 3 + 4 ~ 3 + • • • is irrational 
[ 14 ]. One of the main components of his proof involved the binomial sums 


A n 



( 5 - 141 ) 


for which he announced a recurrence that other mathematicians were unable 
to verify at the time. (The numbers A n have since become known as Apery 
numbers; we have Ac = 1 , Ai = 5 , A2 = 73 , A3 = 1445 , A4 = 33001 .) 
Finally [ 356 ] Don Zagier and Henri Cohen found a proof of Apery’s claim, and 
their proof for this special (but difficult) sum was one of the key clues that 
ultimately led Zeilberger to discover the general approach we are discussing. 

By now, in fact, we have seen enough examples to make the sum in (5.141) 
almost trivial. Putting t(n, k) = (£) 2 ( n £ k ) 2 and t(n,k) = | 3 o(ri)t(n, k) + 
| 3 i (n)t(n + 1 , k) + |32(ti)t(n + 2 , k), we try to solve (5.129) with 

p(n, k) = (n+ 1 -k) 2 (n + 2-k) 2 (3 0 (n) 

+ (ti + 1 + k) 2 (n + 2 — k)“|3i (ti) 

+ (n + 1 + k) 2 (n + 2 + k) 2 |32(n) =p(n,k), 
t(u, k) = t(n, k)/(n+1 — k) 2 (n+2— k) 2 = (n+k)! 2 /k! 4 (n+2-k)! 2 , 
q(n, k) = (n+ 1 + k) 2 (n + 2 - k) 2 , 
r(n, k) = k 4 . 


(First we try do- 
ing without , 
but that attempt 
quickly peters out.) 


(We don’t worry about the fact that q has the factor (k + n + 1 ) while r has 
the factor k; this does not violate (5.118), because we are regarding n as a 
variable parameter, not a fixed integer.) Since q(n, k) — r(n, k) = — 2 k 3 + • • • , 
we are allowed to set deg(s) = — 2 A'/A = 2 , so we take 

s(n,k) = a 2 (n)k 2 + ai (n)k + cxo(n) . 


With this choice of s, the recurrence (5.129) boils down to five equations in 
the six unknown quantities | 3 o(ri), | 3 i(n), (32 (ti), ao(ri), ai (n), a2(n). For 
example, the equation arising from the coefficients of k° simplifies to 


Po + Pi + P2 ~ a 0 — ai — a2 — 0 ; 
the equation arising from the coefficients of k 4 is 

| 3 o + ( 3 1 + P2 + ai + (6 + 6n + 2 n~ )a2 = 0 . 
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“Professor Little- 
wood, when he 
makes use of an 
algebraic identity, 
always saves himself 
the trouble of prov- 
ing it; he maintains 
that an identity, if 
true, can be verified 
in a few lines by 
anybody obtuse 
enough to feel the 
need of verification. 
My object in the 
following pages 
is to confute this 
assertion. ” 

— F.J. Dyson [89] 


The other three equations are more complicated. But the main point is that 
these linear equations — like all the equations that arise when we come to this 
stage of the Gosper-Zeilberger algorithm — are homogeneous (their right- 
hand sides are 0). So they always have a nonzero solution when the number 
of unknowns exceeds the number of equations. A solution, in our case, turns 
out to be 

Po(n) 

|3i(n) 
p2(n) 
a 0 (n) 
ai (n) 
a 2 (n) 

Consequently 

(n + 1 ) 3 t(n, k) — (2n + 3)(17n 2 + 51 n + 39)t(n + 1 , k) 

+ (n + 2) 3 t(n + 2,k) = T(n,k+ 1) — T(n,k) , 

where T(n, k) = k 4 s(u, k)t(n,k) = (2n + 3) (8k 2 — 12k — 16(n + 1 )(n + 2)) x 
(u + k)! 2 /(k — 1 ) ! 4 (tl + 2 — k)! 2 . Summing on k gives Apery’s once-incredible 
recurrence, 

(n + 1 ) 3 A n + (n + 2) 3 A n+ 2 = (2n + 3)(17n 2 +51n+39)A T1+1 . (5.142) 


= (n+1) 3 , 

= — (2n + 3)(17u 2 + 51n + 39) , 
= (n + 2) 3 , 

= — 16(n+1)(n + 2)(2n + 3), 

= -12(2n + 3), 

= 8(2n + 3) . 


Does the Gosper-Zeilberger method work with all the sums we’ve en- 
countered in this chapter? No. It doesn’t apply when t(n,k) is the summand 
(0) (k + 1) k ~'(n — k + 1) n ~ k ~' in (5.65), because the term ratio t(n, k + 
1)/t(n, k) is not a rational function of k. It also fails to handle cases like 
t(n,k) = (>\ because the other term ratio t(n + l,k)/t(n, k) is not a 
rational function of k. (We can do that one, however, by summing ( p )z k 
and then setting z = n.) And it fails on a comparatively simple sum- 
mand like t(n,k) = 1/(nk+ 1), even though both t(n, k+ l)/t(n, k) and 
t(n + 1,k)/t(n,k) are rational functions of u and k; see exercise 107. 

But the Gosper-Zeilberger algorithm is guaranteed to succeed in an enor- 
mous number of cases, namely whenever the summand t(n, k) is a so-called 
proper term — a term that can be written in the form 


t(n, k) 


(am+aik+a")! . . . (a p u+a p k+a")! 

(bm + b'k + b")! . . . (bq-rt + b^k + b")! W Z 


(5-143) 


Here f(n, k) is a polynomial in u and k; the coefficients Qi , a \ , . . . , a p , a p , 
bi, b( , ..., b q , bq are specific integer constants; the parameters w and z 
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What happens if 
t(n, k) is indepen- 
dent of n? 

The following proof is due to Wilf and Zeilberger [374]. 

Let N be the operator that increases n by 1 , and let K be the operator 
that increases k by 1, so that, for example, N 2 K 3 t(n, k) = t(n + 2,k + 3). 

We will study linear difference operators in N, K, and n, namely operator 
polynomials of the form 

i I 

H(N,K,n) = LL , (5-145) 

i=0 j=0 

where each ott j (n) is a polynomial in n. Our first observation is that, if t(n, k) 
is any proper term and H(N,K,n) is any linear difference operator, then 
H(N, K, n)t(n, k) is a proper term. Suppose t and H are given respectively 
by (5.143) and (5.145); then we define a “base term” 

n?=i (am + a(k + ail [at < 0] + a(J [o' < 0] + a") ! 

t(n, k)i j = w n z . 

Ili = i (bin + b(k + bil [bi > 0] + b(J [b' > 0] + b") ! 

For example, if t(n, k) is ( n ^ 2k ) = (n — 2k)!/k! (n — 3k)!, the base term 
corresponding to a linear difference operator of degrees I and J is t(n,k)i j = 

(n — 2k — 2J)!/(k + J)! (n — 3k + I)!. The point is that oqj (n)N l KH(n, k) is 
equal to t(n, k)j j times a polynomial in n and k, whenever 0 ^ i ^ I and 
0 ^ j ^ J- A finite sum of polynomials is a polynomial, so H(N, K, n)t(n, k) 
has the required form (5.143). 

The next step is to show that whenever t(n, k) is a proper term, there is 
always a nonzero linear difference operator H(N, K,n) such that 

H(N, K,n)t(n,k) = 0. 

If 0 ^ i 3s I and 0 ^ j ^ J, the shifted term N l K’t(n, k) is t(n, k)ij times a 
polynomial in n and k that has degree at most 

Dij = deg(f) + |ai|I + |ai|J + H |a P |I + |a p | J 

t bi I : b[ J I • • • I b ( , I ; b ( ' ( | 

in the variable k. Hence the desired H exists if we can solve Di j + 1 homo- 
geneous linear equations in the (1 + 1 )(J + 1 ) variables oqj (n), with coefficients 


are nonzero; and the other quantities a", . . . , a", b", . . . , b" are arbitrary 
complex numbers. We will prove that whenever t(n, k) is a proper term, there 
exist polynomials |3 o(ti), . . . , |3t.(tl), not all zero, and a proper term T(n,k), 
such that 

|3 0 (n)t(n,k) + ■■• + (3i(n)t(n + l,k) = T(n,k+1) -T(n,k) . (5.144) 



5.8 MECHANICAL SUMMATION 241 


that are polynomials in n. All we need to do is choose I and J large enough 
that (I + 1)(J + 1) > Dij + 1. For example, we can take I = 2A' + 1 and 
J = 2A + deg(f), where 


A — |ai | + • • • + |a p | -T |bi | + • • • + |bq| ; 
A' = \a\ | H — + \a!p\ + |b] I H V |bq| . 


The trick here is 
based on regarding 
H as a polynomial 
in K and then 
replacing K by 
A + 1 . 


The last step in the proof is to go from the equation H(N , K, n)t(n, k) = 0 
to a solution of (5.144). Let H be chosen so that J is minimized, i.e., so that 
H has the smallest possible degree in K. We can write 

H(N,K,n) = H(N, 1 ,n) — (K — 1 )G(N, K,n) 

for some linear difference operator G(N,K,n). Let H(N,l,n) = (3o(n) + 
(3 1 (n)N + • • • + |3r(n)N l and T(n,k) = G(N,K,n)t(n,k). Then T(n,k) is a 
proper term, and (5.144) holds. 

The proof is almost complete; we still have to verify that H(N , 1 , n) is not 
simply the zero operator. If it is, then T(n, k) is independent of k. So there 
are polynomials |3o(n) and |3i(n) such that ((3o(n) + |3i (n)N)T(n, k) = 0. 
But then ((3o(n) + |3i (n)N) G(N, K,n) is a nonzero linear difference operator 
of degree J — 1 that annihilates t(n, k); this contradicts the minimality of J, 
and our proof of (5.144) is complete. 

Once we know that (5.144) holds, for some proper term T, we can be 
sure that Gosper’s algorithm will succeed in finding T (or T plus a constant) . 
Although we proved Gosper’s algorithm only for the case of hypergeometric 
terms t(k) in a single variable k, our proof can be extended to the two- variable 
case, as follows: There are infinitely many complex numbers n for which 
condition (5.118) holds when q(n, k) and r(n, k) are completely factored as 
polynomials in k, and for which the calculations of d in Step 2 agree with the 
calculations of Gosper’s one-variable algorithm. For all such n, our previous 
proof shows that a suitable polynomial s(n, k) in k exists; therefore a suitable 
polynomial s(n,k) in n and k exists; QED. 

We have proved that the Gosper- Zeilberger algorithm will discover a 
solution to (5.144), for some l, where l is as small as possible. That solution 
gives us a recurrence in n for evaluating the sum over k of any proper term 
t(n,k), provided that t(n, k) is nonzero for only finitely many k. And the 
roles of n and k can, of course, be reversed, because the definition of proper 
term in (5.143) is symmetrical in n and k. 

Exercises 99-108 provide additional examples of the Gosper-Zeilberger 
algorithm, illustrating some of its versatility. Wilf and Zeilberger [374] have 
significantly extended these results to methods that handle generalized bino- 
mial coefficients and multiple indices of summation. 
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Exercises 


Warmups 


1 What is 11 4 ? Why is this number easy to compute, for a person who 
knows binomial coefficients? 

2 For which value(s) of k is (£) a maximum, when n is a given positive 
integer? Prove your answer. 

3 Prove the hexagon property, 


n — 1 
k- 1 





n+1 
k + 1 



4 Evaluate ( k ) by negating (actually un-negating) its upper index. 

5 Let p be prime. Show that (£) mod p = 0 for 0 < k < p. What does this 
imply about the binomial coefficients ( p k 1 )? 

6 Fix up the text’s derivation in Problem 6 , Section 5.2, by correctly ap- 
plying symmetry. 

7 Is ( 5 . 34 ) true also when k < 0? 

8 Evaluate 

^(k) ( “ 1)k(1 ~ k/nr - 

k ' ' 

What is the approximate value of this sum, when n is very large? Hint: 
The sum is A n f(0) for some function f. 

9 Show that the generalized exponentials of ( 5 . 58 ) obey the law 

£t(z) = £(tz) Vt , if t ^ 0, 


10 

11 


where £(z) is an abbreviation for £1 (z). 

Show that — 2 (ln( 1 — z) + z)/z 2 is a hypergeometric function. 
Express the two functions 


smz = + + 

1-z 3 1 • 3 ■ z 5 1 • 3 • 5 ■ z 7 

arcsm z = z+ — + TTJ + T?T?T + 


A case of 
mistaken identity. 


in terms of hypergeometric series. 
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(Here t and T 
aren’t necessar- 
ily related as in 
(5-120).) 


12 Which of the following functions of k is a hypergeometric term, as defined 
in Section 5 . 7 ? Explain why or why not. 

a n k . 

b k n . 

c (k! + (k + 1 )!)/2. 
d H k , that is, } + \ H I [. 

e I/O- 

f t (k) T (k) , when t and T are hypergeometric terms, 

g t(k) + T(k), when t and T are hypergeometric terms, 
h t(n — k), when t is a hypergeometric term, 
i a t(k) + b t(k+l ) + c t(k+2), when t is a hypergeometric term. 

j [k/ 2 ]. 

k k[k>0]. 

Basics 

13 Find relations between the superfactorial function P n = Ok=i k! of ex- 
ercise 4 . 55 , the hyperfactorial function Q n = Ok=i k k , an< 2 the product 

Rn = nLo©- 

14 Prove identity (5.25) by negating the upper index in Vandermonde’s con- 
volution (5.22). Then show that another negation yields (5.26). 

15 What is (k) 3 (“ 1 ) k ? Hint: See (5.29). 

16 Evaluate the sum 



when a,b,c are nonnegative integers. 

17 Find a simple relation between ( 2n ~ 1/2 ) and 

18 Find an alternative form analogous to (5.35) for the product 


r-1/3\ (r — 2/3 


19 Show that the generalized binomials of (5.58) obey the law 

®t(z) = ®i-t(-z) _1 . 


20 Define a “generalized bloopergeometric series” by the formula 




k 

• Q-m 


i k -1 k 

bp ... bn 


z 


k 


k! ’ 


using falling powers instead of the rising ones in (5.76). Explain how G is 
related to F. 
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21 Show that Euler’s definition of factorials is consistent with the ordinary 
definition, by showing that the limit in ( 5 . 83 ) is 1 / ((m— 1 ) . . . ( 1 )) when 
z = m is a positive integer. 

22 Use ( 5 . 83 ) to prove the factorial duplication formula : 

x!(x- 1)! = (2x)!(-|)!/2 2x . 


By the way, 



23 What is the value of F(— n, 1 ; ; 1 )? 

24 Find ( m ™ J b Y usin g hypergeometric series. 

25 Show that 


(ai - bi ) F 

= ai F 


Qi, a 2) a m 
b] +1 , b 2 , . . . , b, 
Qi+1, a 2 , . . . , a m 
b-i +1 , b 2 , b n 


z - bi F 


ai , 0 . 2 , ■ ■ ■ , a,, 
bi, b 2) . . . , b T 


Find a similar relation between the hypergeometrics 


ai , a 2 , a 3 , . . . , a r 
bi, . . . , b n 

ai +1, a 2 , a 3 > • • • > 
b 1 , . . . , b n 

ai , a 2 + 1 , a 3 > • • • ) 
bi , . . . , b n 


and 


z . 


26 Express the function G(z) in the formula 


z = 1 + G(z) 


ai, . . . , a m 

bi , . .., b n 

as a multiple of a hypergeometric series. 
27 Prove that 


a lt ai + 2 , . . . , a m , a m + 2 
b i , bi + 2, •••> b n , b n ~l- 2 > 2 


(2 


m—n—1 y\2 


T_/ f f 2ai,...,2a m 
" V 2b 1 , . . . ,2b n 


z +F 


2ai,...,2a m 

2bi,...,2b n 


28 Prove Euler’s identity 


a, b 
c 


z = (1 — z 


,c— a— b ■ 


c—a, c — b 

c 


by applying Pfaff’s reflection law ( 5 . 101 ) twice. 
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29 Show that confluent hypergeometrics satisfy 


e z F 




30 What hypergeometric series F satisfies zF'(z) + F(z) = 1/(1 — z)? 

31 Show that if f(k) is any function summable in hypergeometric terms, 
then f itself is a hypergeometric term. For example, if f(k) 5k = 
cF(Ai , . . . , Am; Bi , . . . , Bn; Z)k + C, then there are constants aj , . . . , a m , 
bi , . . . , b n , and z such that f(k) is a multiple of ( 5 . 115 ). 

32 Find k 2 6 k by Gosper’s method. 

33 Use Gosper’s method to find ]T 5k/ (k 2 — 1 ). 

34 Show that a partial hypergeometric sum can always be represented as a 
limit of ordinary hypergeometrics: 



lim F 

e— ► () 


/ -c, a 1( ..., a m 
\e-c, bi, . . . , b n 


when c is a nonnegative integer. (See ( 5 .ii 5 i>) Use this idea to evaluate 
Homework exercises 

35 The notation £Ik<n (£)2 k ~ n i s ambiguous without context. Evaluate it 
a as a sum on k; 

b as a sum on n. 

36 Let p k be the largest power of the prime p that divides ( m + n ), when m 
and n are nonnegative integers. Prove that k is the number of carries 
that occur when m is added to n in the radix p number system. Hint: 
Exercise 4.24 helps here. 

37 Show that an analog of the binomial theorem holds for factorial powers. 
That is, prove the identities 


(x + p)^ 

(x + uf 



for all nonnegative integers n. 

38 Show that all nonnegative integers n can be represented uniquely in the 
form n = + ^ + where a, b, and c are integers with 0 ^ a < b < c. 

(This is called the binomial number system.) 
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39 Show that if xy = ax + by then 


x n y n 


= z 

k=1 


2 n - 1 - k 
n- 1 


(a n b n - fc x k + a n - fc b n y fc ) 


for all n > 0. Find a similar formula for the more general product x m y n . 
(These formulas give useful partial fraction expansions, for example when 

x = 1 /(z — c) and y = 1 /(z — d).) 

40 Find a closed form for 


2j-n j+1 

j=i 



j + rk + s\ 
m- j )’ 


integers m, n 5s 0 . 


41 Evaluate ( k )k!/(n +1 + k)! when n is a nonnegative integer. 

42 Find the indefinite sum H ((— 1 ) x / (™)) 6 x, and use it to compute the sum 
LLoM ) k / (£) in closed form. 

43 Prove the triple-binomial identity ( 5 . 28 ). Hint: First replace by 

I] (m+t-iKf)' 

44 Use identity ( 5 . 32 ) to find closed forms for the double sums 


^J-1) i+k 


iA 


^ (-1) i+k 

j ,1cSs0 


j + h' 

j 

a 

1 


m 


f n — j - k 
m — j 

m + n 

i+h 


and 


given integers m. ^ a ^ 0 and n ^ b ^ 0 . 

45 Find a closed form for ^T k<Tt 

46 Evaluate the following sum in closed form, when n is a positive integer: 

/2k- 1\ /4n-2k- 1\ 

V k 2n — k ) (2k - l)(4n-2k - 1) ' 

Hint: Generating functions win again. 

47 The sum 


r- /rk + s\ /rn — rk — s\ 

U k A ) 


is a polynomial in r and s. Show that it doesn’t depend on s. 
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48 The identity J2 k<n ( n + k )2 k = 2 n can be combined with the formula 

Ikjo ( n n k ) zk = V(1 - z) n+1 to yield 



= 2 n . 


What is the hypergeometric form of the latter identity? 
49 Use the hypergeometric method to evaluate 


IH 



/x + n - k\ y 
\ n — k /y+n-k' 


50 Prove Pfaff’s reflection law (5.101) by comparing the coefficients of z n 
on both sides of the equation. 

51 The derivation of (5.104) shows that 

lim e ^ 0 F( — m, —2m — 1 + e; — 2m + e; 2) = 1 /(^J/ 2 ) . 


In this exercise we will see that slightly different limiting processes lead 
to distinctly different answers for the degenerate hypergeometric series 
F(— m, —2m — 1 ; —2m; 2). 

a Show that lim e ^o F(— m + e, —2m — 1 ; —2m + 2e; 2) = 0, by using 
Pfaff’s reflection law to prove the identity F(a, —2m — 1 ; 2a; 2) = 0 
for all integers m 0. 

b What is lim e ^o F(— m + e, —2m — 1 ; —2m + e; 2)1 
52 Prove that if N is a nonnegative integer, 


b?...b"F 


- n N n N 

— a l ■ • • a m 


a 1 ,...,a m ,-N 
bi,...,b n 

l-b 1 -N,...,1-b n -N,-N 
1— a,— N,...,1— a m — N 


-z) N F 


(-1 


) m+n 


53 If we put b = — j and z = 1 in Gauss’s identity (5.110), the left side 
reduces to —1 while the right side is +1. Why doesn’t this prove that 
-1 = + 1 ? 

54 Explain how the right-hand side of (5.112) was obtained. 

55 If the hypergeometric terms t(k) = F(ai,...,a m ; bi,...,b n ; z) k and 
T(k) = F(Ai,...,A M ;B 1 ,...,B N ;Z) k satisfy t(k) = c(T(k+ 1) -T(k)) 
for all k ^ 0, show that z = Z and m — n = M — N. 

56 Find a general formula for ( _ k 3 ) 6k using Gosper’s method. Show that 

(—1 ) k_1 [b±lj is also a solution. 
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57 Use Gosper’s method to find a constant 0 such that 



z k (k + 0) 6k 


is summable in hypergeometric terms. 

58 If m and n are integers with 0 m s£ n, let 



1 

n-k ' 


Find a relation between T m n and T m _i n _i , then solve your recurrence 
by applying a summation factor. 


Exam problems 
59 Find a closed form for 



when m. and n are positive integers. 

60 Use Stirling’s approximation (4.23) to estimate ( m + Tt ) when m and n 
are both large. What does your formula reduce to when m = n? 

61 Prove that when p is prime, we have 



{ L n /pJ \ / ti mod p \ 
\ Lm/pJ ) \m mod p ) 


(mod p) , 


for all nonnegative integers m and n. 


62 Assuming that p is prime and that m and n are positive integers, deter- 
mine the value of (™p) mod p 2 . Hint: You may wish to use the following 
generalization of Vandermonde’s convolution: 



n 


+ r„ 


63 Find a closed form for 
2 n + k' 




k=0 


2k 


given an integer n 5s 0. 
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64 Evaluate 

k=0 

65 Prove that 


(n\ / 

'k + r 

w/ 

2 


, given an integer n ^ 0. 


Y_ k ( k + 1)! = n. 

66 Evaluate “Harry’s double sum,” 

integ “ m>0 ’ 

as a function of m. (The sum is over both ) and k.) 

67 Find a closed form for 


L 


Q)W2n-k 


k=0 

68 Find a closed form for 


L 


integer n 0. 


min(k, n — k) , integer 0. 


69 Find a closed form for 


mm 


L 


ki,...,k m S:Q j = 1 
ki H hk m — n 


as a function of m and n. 
70 Find a closed form for 


L 


n\ /2k\ /-I 


integer n 0. 


71 Let 


Sn = ^ 


kSO 


n + k 
m + 2k 


ak , 


where m and n are nonnegative integers, and let A(z) = £^ k>0 a kZ k be 
the generating function for the sequence (do, ai , az , . . . ). 

a Express the generating function S(z) = 22n>o SnA™ i n terms of A(z). 
b Use this technique to solve Problem 7 in Section 5.2. 
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72 Prove that, if m, n, and k are integers and n > 0, 

( m / n )n 2k - (k) is an integer, 

where v(k) is the number of 1’s in the binary representation of k. 

73 Use the repertoire method to solve the recurrence 

Xo — oc; Xi = |3 ; 

X n = (n- 1)(X n _-i +X n _ 2 ) , for n > 1 . 


Hint: Both n! and nj satisfy this recurrence. 

74 This problem concerns a deviant version of Pascal’s triangle in which the 
sides consist of the numbers 1 , 2, 3, 4, ... instead of all 1 ’s, although the 
interior numbers still satisfy the addition formula: 


1 

2 2 
3 4 3 

4 7 7 4 

5 11 14 11 5 


If ((£)) denotes the kth number in row n, for 1 ^ k ^ n, we have 

((?)) = O = n ’ and ((k)) = ((V)) + ((£-])) for 1 < k < n. Express 

the quantity ((£)) in closed form. 

75 Find a relation between the functions 



and the quantities [2 n / 3 J and [~2 n /3]. 

76 Solve the following recurrence for n, k 0: 


Qn,o = 1 ; Qo,k — [k = 0] ; 


Qn,k = Qn-1 ,k + Qn-1 ,k-l + 


n 

k 


for n, k > 0. 
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Handy to know. 


77 What is the value of 

l n ( k r)> *- >i? 

O^ki ,...,k m ^n 1^j<m ' - 1 ' 

78 Assuming that m is a positive integer, find a closed form for 

2 y 1 / k mod m 

\(2k + 1 ) mod (2m + 1 ) 

79 a What is the greatest common divisor of ( 2n ), ( 2 3 n ), (i 2r ^i)? 

Hint: Consider the sum of these n numbers, 
b Show that the least common multiple of (™), ( 1 | l ), . . . , is equal 
to L(n + 1 )/(n + 1 ), where L(n) = lcm(l , 2 , . . . , n). 

80 Prove that (£) ^ (en/k) k for all integers k,n ^ 0. 

81 IfO < 0 < 1 and 0 ^ x SC 1 , and if l, m, n are nonnegative integers with 
m < n, prove the inequality 



Hint: Consider taking the derivative with respect to x. 


Bonus problems 

82 Prove that Pascal’s triangle has an even more surprising hexagon prop- 
erty than the one cited in the text: 



if 0 < k < n. For example, gcd(56,36,210) = gcd(28, 120, 126) = 2. 

83 Prove the amazing five-parameter double-sum identity ( 5 . 32 ). 

84 Show that the second pair of convolution formulas, ( 5 . 61 ), follows from 
the first pair, ( 5 . 60 ). Hint: Differentiate with respect to z. 

85 Prove that 


m=1 1 ^ki <k 2 <---<k m ^n 




(The left side is a sum of 2 n — 1 terms.) Hint: Much more is true. 
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86 Let ai , a n be nonnegative integers, and let C(ai , . . . , Q n ) be the 
coefficient of the constant term z° . . . z° when the n(n — 1 ) factors 


n 

1 

Mi 


1 -* 

z i 


are fully expanded into positive and negative powers of the complex vari- 
ables Z] , . . . , z n . 

a Prove that C(qi , . . . , a n ) equals the left-hand side of ( 5 . 31 ). 
b Prove that if zj , . . . , z n are distinct complex numbers, then the 
polynomial 


f(z) = Z. 

k=1 


n 

Mk 


Z — Zj 
Zk Zj 


is identically equal to 1 . 

c Multiply the original product of n(n— 1 ) factors by f(0) and deduce 
that C(ai , 02 , . . . , a n ) is equal to 


C(aj - 1 , a 2 , . . . , a n ) + C(ai , a 2 - 1 , . . . , a n ) 
H b C(ai , a 2> . . . , a n - 1) . 


(This recurrence defines multinomial coefficients, so C(ai , . . . , a n ) 
must equal the right-hand side of ( 5 . 31 ).) 

87 Let m be a positive integer and let C = e m ^ m . Show that 



z mk 




(1 +m)® 


(z m ) n+1 
-m(z m ) - m 


- L 

0^j<m 


(C 2 i + 'z'B 1 + 1 /m (C 2 i + l z) Vm ) n + l 

(m+l)® 1 + 1 /m (C 2 Hi z )-i _! 


(This reduces to ( 5 . 74 ) in the special case m = 1 .) 
88 Prove that the coefficients in ( 5 . 47 ) are equal to 


Mr 


00 Ht 

e- t (1-e- t ) k " 1 - 

n t 


for all k > 1; hence |sk| < l/(k — 1 ). 
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89 Prove that (5.19) has an infinite counterpart, 

k y m ~ k = (^)(-x) k (x+y) m - k , integer m, 

k>m x / 

if |x| < |y| and |x| < |x + y|. Differentiate this identity n times with 
respect to y and express it in terms of hypergeometrics; what relation do 
you get? 

90 Problem 1 in Section 5.2 considers £2k>o (k) / (k) w ^ len T an d s 3X6 in- 
tegers with s ^ r 0 . What is the value of this sum if r and s aren’t 
integers? 

91 Prove Whipple’s identity , 



2<i, |q+ 7, 1+a — b — c 
1 +a— b, 1 +a— c 


— 4 z 


= (1-z) Q F 


a, b, c 


1 + a— b, 1 + a— c I 
by showing that both sides satisfy the same differential equation. 
92 Prove Clausen’s product identities 


a, b 

a + b + 4 


■ a, 


1 
2 

1 +b 


1 

4 1 **> 4 
1+a+b 


= F 


( 2a, a+b, 2b 

V 2a + 2b, a+b + X 


1 — a 1-1 

4 ^>4 1 

1 — a— b 


= F 


1 1 
2 ’ 2 


a— b, j — a + b 
1 +a+b, 1 — a— b 


What identities result when the coefficients of z n on both sides of these 
formulas are equated? 

93 Show that the indefinite sum 


L 


Yim+oc) 


1=1 


/ 



5 k 


has a (fairly) simple form, given any function f and any constant a. 

94 Find L OM 6k. 

95 What conditions in addition to (5.118) will make the polynomials p, q, r 
of (5.117) uniquely determined? 
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96 Prove that if Gosper’s algorithm finds no solution to ( 5 . 120 ), given a 
hypergeometric term t(k), then there is no solution to the more general 
equation 

t(k) = (Tj (k + 1 ) + • • ■ + T m (k + 1 )) — (Ti (k) + ■ ■ ■ + T m (k)) , 

where T| (k), . . . , T m (k) are hypergeometric terms. 

97 Find all complex numbers z such that k! 2 / (j 2 + jz+ 1 ) is summable 

in hypergeometric terms. 

98 What recurrence does the Gosper-Zeilberger method give for the sum 

s « = n (a? 

99 Use the Gosper-Zeilberger method to discover a closed form for t(n, k) 
when t(n, k) = (n + a + b + c + k) ! / (n — k) ! (c + k)! (b — k)! (a — k) ! k! , 
assuming that a is a nonnegative integer. 

100 Find a recurrence relation for the sum 



and use the recurrence to find another formula for S n . 


101 Find recurrence relations satisfied by the sums 



Better use computer 
algebra for this one 
(and the next few). 


102 Use the Gosper-Zeilberger procedure to generalize the “useless” identity 
( 5 . 113 ): Find additional values of a, b, and z such that 



has a simple closed form. 

103 Let t(n, k) be the proper term ( 5 . 143 ). What are the degrees of p(n,k), 
q (n, k), and r(n, k) in terms of the variable k, when the Gosper-Zeilberger 
procedure is applied to t(n, k) = (3o(tL)t(n, k) + • • • + (3i(n)t(n + l,k)? 
(Ignore the rare, exceptional cases.) 
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104 Use the Gosper-Zeilberger procedure to verify the remarkable identity 


Ih: 


t — s — k\ /r — 2k 


n — k/r — n — k + 1 


1 

ri / r — 2n + 1 


Explain why the simplest recurrence for this sum is not found. 
105 Show that if cu = e 27ri / 3 we have 

, 2 


L 


k+l+m=3n 


3 n 

k, l, m 


cu 


l-m 


An 
n, n, 2n) ’ 


integer n 0. 


106 Prove the amazing identity (5.32) by letting t(r,j,k) be the summand 
divided by the right-hand side, then showing that there are functions 
T(r,j,k) and U(rJ,k) for which 


t(r-+ 1,j,k) — t(r,j,k) = T(r,j + l,k) — T(r, j, k) 

+ U(r,j,k + 1) -U(r,j,k) . 


107 Prove that 1/(nk+ 1) is not a proper term. 

108 Show that the Apery numbers A n of (5.141) are the diagonal elements 
A n n of a matrix of numbers defined by 


A 


m,n 



2m + n — j 
2m 



Prove, in fact, that this matrix is symmetric, and that 

Am,n = y 


m + n — k\ 2 /m + n — 2k^ 2 


L 


m — k 


m\ fn\ /m + k\ (n + k 


109 Prove that the Apery numbers (5.141) satisfy 

An = A|_ n / p j A n mod p (mod p) 

for all primes p and all integers n ^ 0. 

Research problems 

110 For what values of n is ( 2 ^) = (— 1) n (mod (2n + 1))? 
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111 Let q(n) be the smallest odd prime factor of the middle binomial co- 
efficient ( 2n ). According to exercise 36, the odd primes p that do not 
divide ( 2 t J 1 ) are those for which all digits in n’s radix p representation 
are (p — 1 )/2 or less. Computer experiments have shown that q(n) sj 1 1 
for 1 < n < lO 10000 , except that q (31 60) = 13. 

a Is q(n) ^ 1 1 for all n > 3160? 
b Is q(n) = 1 1 for infinitely many n? 

A reward of $7 • 1 1 ■ 13 is offered for a solution to either (a) or (b). 

112 Is ( 2 ’ x ) divisible either by 4 or by 9, for all n > 4 except n = 64 and 
n = 256? 

113 If t(n + 1 , k)/t(n, k) and t(n, k+ 1)/t(n,k) are rational functions of n 
and k, and if there is a nonzero linear difference operator H(N, K,n) such 
that H(N,K,n)t(n, k) = 0, does it follow that t(n, k) is a proper term? 

114 Let m be a positive integer, and define the sequence Cn m) by the recur- 
rence 



Are these numbers cl," 11 integers? 



I 



Special Numbers 


SOME SEQUENCES of numbers arise so often in mathematics that we rec- 
ognize them instantly and give them special names. For example, everybody 
who learns arithmetic knows the sequence of square numbers (1 , 4, 9, 1 6, . . . ). 
In Chapter 1 we encountered the triangular numbers (1 , 3, 6, 10, . . . ); in Chap- 
ter 4 we studied the prime numbers (2, 3, 5, 7, . . . ); in Chapter 5 we looked 
briefly at the Catalan numbers (1 , 2, 5, 14, . . . ). 

In the present chapter we’ll get to know a few other important sequences. 
First on our agenda will be the Stirling numbers {£} and [£] , and the Eulerian 
numbers these form triangular patterns of coefficients analogous to the 
binomial coefficients (£) in Pascal’s triangle. Then we’ll take a good look 
at the harmonic numbers H n , and the Bernoulli numbers B n ; these differ 
from the other sequences we’ve been studying because they’re fractions, not 
integers. Finally, we’ll examine the fascinating Fibonacci numbers F n and 
some of their important generalizations. 


6.1 STIRLING NUMBERS 


. . par cette nota- 
tion, les for mules 
deviennent plus 
symetriques." 

— J. Karamata [199] 


We begin with some close relatives of the binomial coefficients, the 
Stirling numbers, named after James Stirling (1692-1770). These numbers 
come in two flavors, traditionally called by the no-frills names “Stirling num- 
bers of the first and second kind.” Although they have a venerable history 
and numerous applications, they still lack a standard notation. Following Jo- 
van Karamata, we will write {£} for Stirling numbers of the second kind and 
[£] for Stirling numbers of the first kind; these symbols turn out to be more 
user-friendly than the many other notations that people have tried. 

Tables 258 and 259 show what {£} and [£] look like when n and k are 
small. A problem that involves the numbers “1, 7, 6, 1” is likely to be related 
to {£}, and a problem that involves “6, 11, 6, 1” is likely to be related to 
[£], just as we assume that a problem involving “1, 4, 6, 4, 1” is likely to be 
related to (£); these are the trademark sequences that appear when n. = 4. 
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Table 258 Stirling’s triangle for subsets. 



Stirling numbers of the second kind show up more often than those of 
the other variety, so let’s consider last things first. The symbol {£} stands 
for the number of ways to partition a set of n things into k nonempty subsets. 
For example, there are seven ways to split a four-element set into two parts: 


(Stirling himself 
considered this 
kind first in his 
book [343].) 


{1 , 2, 3} U {4} , {1,2,4} U {3}., {1,3,4}U{2}, {2,3,4}U{1}, 

{1 , 2} U {3, 4} , {1 , 3} U {2,4} , {1,4}U{2,3}; (6.i) 


thus { 2 } = 7. Notice that curly braces are used to denote sets as well as 
the numbers {£}. This notational kinship helps us remember the meaning of 
{£}, which can be read “n subset k.” 

Let’s look at small k. There’s just one way to put n elements into a single 
nonempty set; hence I™} = 1, for all n > 0. On the other hand {‘j’} = 0, 
because a 0-element set is empty. 

The case k = 0 is a bit tricky. Things work out best if we agree that 
there’s just one way to partition an empty set into zero nonempty parts; hence 
{q} = 1. But a nonempty set needs at least one part, so = 0 for n > 0. 

What happens when k = 2? Certainly =0. If a set of ri > 0 objects 
is divided into two nonempty parts, one of those parts contains the last object 
and some subset of the first n— 1 objects. There are 2 n_1 ways to choose the 
latter subset, since each of the first n — 1 objects is either in it or out of it; 
but we mustn’t put all of those objects in it, because we want to end up with 
two nonempty parts. Therefore we subtract 1 : 


= 2 n_1 - 1 


integer n > 0. 


(6.2) 


(This tallies with our enumeration °f { 2 } = ^ = 2 3 — 1 ways above.) 
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Table 259 Stirling’s triangle for cycles. 


n 


n 


n 


n 


n 


n 


n 


n 


n 


n 

_ 0 _ 


1 


_ 2 _ 


3 


4 


5 


_ 6 _ 


7 _ 


8 


9 


0 

1 








1 

0 

1 
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0 

1 

1 






3 

0 

2 

3 

1 





4 

0 

6 

11 

6 

1 




5 

0 

24 

50 

35 

10 

1 



6 

0 

120 

274 

225 

85 

15 

1 


7 

0 

720 

1764 

1624 

735 

175 

21 

1 

8 

0 

5040 

13068 

13132 

6769 

1960 

322 

28 1 

9 

0 

40320 

109584 

118124 

67284 

22449 

4536 

546 36 


A modification of this argument leads to a recurrence by which we can 
compute {£} for all k: Given a set of n > 0 objects to be partitioned into k 
nonempty parts, we either put the last object into a class by itself (in { j) J } 
ways), or we put it together with some nonempty subset of the first n — 1 
objects. There are k{ n ~' } possibilities in the latter case, because each of the 
} ways to distribute the first n — 1 objects into k nonempty parts gives 
k subsets that the nth object can join. Hence 

{k} = k {V} + {"-l'}' ““««’'><>■ (6-3) 

This is the law that generates Table 258; without the factor of k it would 
reduce to the addition formula ( 5 . 8 ) that generates Pascal’s triangle. 

And now, Stirling numbers of the first kind. These are somewhat like 
the others, but [£] counts the number of ways to arrange n objects into k 
cycles instead of subsets. We verbalize ‘ [£] ’ by saying “n cycle k.” 

Cycles are cyclic arrangements, like the necklaces we considered in Chap- 
ter 4. The cycle 

cAn 

D B 

^C^ 

can be written more compactly as ‘[A, B, C, D] ’, with the understanding that 

[A,B,C,D] = [B, C, D, A] = [C.D.A.B] = [D,A,B,C]; 

a cycle “wraps around” because its end is joined to its beginning. On the other 
hand, the cycle [A, B, C, D] is not the same as [A, B, D, C] or [D, C, B, A]. 
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There are eleven different ways to make two cycles from four elements: 

[1,2,3] [4], [1,2,4] [3], [1,3,4] [2], [2,3,4] [1], 

[1.3.2] [4], [1,4,2] [3], [1,4,3] [2], [2,4,3] [1], 

[1.2] [3,4], [1,3] [2,4], [1,4] [2,3]; ( 6 . 4 ) 

hence [ 2 ] = 11 . 

A singleton cycle (that is, a cycle with only one element) is essentially 
the same as a singleton set (a set with only one element). Similarly, a 2-cycle 
is like a 2-set, because we have [A, B] = [B,A] just as {A, B} = {B,A}. But 
there are two different 3-cycles, [A,B,C] and [A, C,B]. Notice, for example, 
that the eleven cycle pairs in ( 6 . 4 ) can be obtained from the seven set pairs 
in ( 6 . 1 ) by making two cycles from each of the 3-element sets. 

In general, n!/n = (n — 1)! different n-cycles can be made from any n- 
element set, whenever n > 0. (There are n! permutations, and each n-cycle 
corresponds to n of them because any one of its elements can be listed first.) 
Therefore we have 


“There are nine 
and sixty ways 
of constructing 
tribal lays, 
And-every-singie- 
one-of-them-is- 
right." 

— Rudyard Kipling 


n 

1 


(n — 1 )! , integer n > 0 . 


(6-5) 


This is much larger than the value j) 1 } = 1 we had for Stirling subset numbers. 
In fact, it is easy to see that the cycle numbers must be at least as large as 
the subset numbers, 



integers n, k ^ 0 , 


( 6 . 6 ) 


because every partition into nonempty subsets leads to at least one arrange- 
ment of cycles. 

Equality holds in ( 6 . 6 ) when all the cycles are necessarily singletons or 
doubletons, because cycles are equivalent to subsets in such cases. This hap- 
pens when k = n and when k = n — 1 ; hence 


n" 

- H- 

n 

_ J M 

u 

■ Ur 

n -1 

- \n — 1 / 


In fact, it is easy to see that 


n 

n 



n 

n- 1 


n 

n- 1 



(6-7) 


(The number of ways to arrange n objects into n — 1 cycles or subsets is the 
number of ways to choose the two objects that will be in the same cycle or 
subset.) The triangular numbers (™) = 1, 3, 6 , 10, ... are conspicuously 
present in both Table 258 and Table 259. 
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We can derive a recurrence for [£] by modifying the argument we used 
for {£}. Every arrangement of n objects in k cycles either puts the last object 
into a cycle by itself (in [^Zj] ways) or inserts that object into one of the 
c Y cle arrangements of the first n— 1 objects. In the latter case, there 
are n — 1 different ways to do the insertion. (This takes some thought, but 
it’s not hard to verify that there are ) ways to put a new element into a j-cycle 
in order to make a (j + 1 )-cycle. When j = 3, for example, the cycle [A, B, C] 
leads to 


[A.B.C.D], 


[A,B,D,C], or [A, D, B, C] 


when we insert a new element D, and there are no other possibilities. Sum- 
ming over all j gives a total of n — 1 ways to insert an nth object into a cycle 
decomposition of n — 1 objects.) The desired recurrence is therefore 


n 

k 


(n 


D 


n — 1 
k 


n — 1 

k -1 


integer n > 0 . 


( 6 . 8 ) 


This is the addition-formula analog that generates Table 259. 

Comparison of ( 6 . 8 ) and ( 6 . 3 ) shows that the first term on the right side is 
multiplied by its upper index (n— 1 ) in the case of Stirling cycle numbers, but 
by its lower index k in the case of Stirling subset numbers. We can therefore 
perform “absorption” in terms like n[£] and k{£}, when we do proofs by 
mathematical induction. 

Every permutation is equivalent to a set of cycles. For example, consider 
the permutation that takes 123456789 into 384729156. We can conveniently 
represent it in two rows, 


1 23456789 
3847291 56, 


showing that 1 becomes 3 and 2 becomes 8 , etc. The cycle structure comes 
about because 1 becomes 3, which becomes 4, which becomes 7, which be- 
comes the original element 1; that’s the cycle [1,3, 4, 7]. Another cycle in 
this permutation is [2,8,5]; still another is [6,9]. Therefore the permutation 
384729156 is equivalent to the cycle arrangement 

[1,3, 4, 7] [2,8,5] [6,9], 

If we have any permutation 7ti 7 T 2 . . . 7t n of {1 , 2, ... , n}, every element is in a 
unique cycle. For if we start with mo = m and look at mi = 7t mo , m 2 = 7t m , , 
etc., we must eventually come back to mt = mo- (The numbers must repeat 
sooner or later, and the first number to reappear must be mo because we 
know the unique predecessors of the other numbers mi, m 2 , ..., rrik-i-) 
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Therefore every permutation defines a cycle arrangement. Conversely, every 
cycle arrangement obviously defines a permutation if we reverse the construc- 
tion, and this one-to-one correspondence shows that permutations and cycle 
arrangements are essentially the same thing. 

Therefore [£] is the number of permutations of n objects that contain 
exactly k cycles. If we sum [£] over all k, we must get the total number of 
permutations: 

= n! , integer n 0. (6-9) 

For example, 6+11+6+1 = 24 =41. 

Stirling numbers are useful because the recurrence relations (6.3) and 
(6.8) arise in a variety of problems. For example, if we want to represent 
ordinary powers x n by falling powers x— , we find that the first few cases are 


x- + 3x- + x- ; 
x- + 6x- + 7x- + x- . 

These coefficients look suspiciously like the numbers in Table 258, reflected 
between left and right; therefore we can be pretty confident that the general 
formula is 





integer n 0. 


(6.10) 


And sure enough, a simple proof by induction clinches the argument: We 
have x-x- = x^±l + kx— , because x^'^ = x— (x — k); hence x-x n_1 is 



In other words, Stirling subset numbers are the coefficients of factorial powers 
that yield ordinary powers. 


We’d better define 

m= x -0 

when k < 0 and 
n ^ 0. 
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We can go the other way too, because Stirling cycle numbers are the 
coefficients of ordinary powers that yield factorial powers: 


A. /V , 

x 2 = x 2 + x 1 ; 

x 2 = x 3 + 3 x 2 + 2x’ ; 

x 4 = x 4 + 6x 3 + llx 2 +6X 1 . 


We have (x + n — 1 ) • x k = x k+1 + (n — 1 )x k , so a proof like the one just given 
shows that 


(x + n— l)x n 1 = (x + n— 1) 


n- 1 
k 


= z 


x k . 


This leads to a proof by induction of the general formula 


= z 


integer n 0. 


(6.n) 


(Setting x = 1 gives (6.9) again.) 

But wait, you say. This equation involves rising factorial powers x n , 
while (6.10) involves falling factorials x— . What if we want to express x— 
in terms of ordinary powers, or if we want to express x n in terms of rising 
powers? Easy; we just throw in some minus signs and get 


3 

II 

*M 

K 

U. 


integer n ^ 0; 

(6.12) 

x^ = 

n" 

k 

(-l) n - k x k , 

integer n Jp 0. 

( 6 - 13 ) 


k 


This works because, for example, the formula 

x- = x(x — l)(x — 2 )(x — 3 ) = x 4 — 6x 3 + 1 lx 2 — 6x 

is just like the formula 

x 4 = x ( x + i)( x + 2 )( x + 3 ) = x 4 + 6x 3 + 1 1 x 2 + 6x 

but with alternating signs. The general identity 

x^ = (-in-xr (6.14) 

of exercise 2.17 converts (6.10) to (6.12) and (6.11) to (6.13) if we negate x. 
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Table 264 Basic Stirling number identities, for integer n 0. 


Recurrences: 


"r'Wi"-, 1 


= (n-l) 


' B r , l + [?-.' 


Special values: 


n n 


0 0 


= [n = 0] . 


= [n>0] ; 


= (n — 1)! [n>0] . 


= (2 n_1 — l)[n>0] ; 


= (n- l)!H n _-| [n > 0] . 


n — If n-1 


TV TV 


TV TV 


TV TV 


= 0 , if k > n. 


Converting between powers: 


= z ;k = L ; h)-^ 


= l r h 


] n - k x k ; 


= z 


Inversion formulas: 


TV! fk 


k m 


-1) n - k = [m = n] ; 


n 1 r k 


k m 


(-1)— k = [ m = n] 
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Table 265 Additional Stirling number identities, for integers 1, m,n 0. 


n m (-ir- m £ 

= V n - m n k 

/_ k k-m 
k 


Also, 

n ( n -l)2^ 

m v ’ 

= Lk k Cl> 

a generalization 
of (6.9). 



m + n + 1 
m 

( n ' 

\ml 


n 2mn[ n ^ m ] 


n 

n — m 
n 

n — m 

n 1 A + m 
l + mjV l 


^(n + k) 

k=0 


n + k" 
k 


L 

k 

L 

k 

L 

k 

L 


n 

l + m 


l + m 
l 


n + 1 
k + 1 


k' 

m 

k' 
m I 


(~l) m - k . 

(-l) m - k . 


/ m — n\ / m + n\ 
\m + k/ \n + k/ 


m + k 
k 


/m — n\ /m + n\Jm + k 
\m + kj \ n + k/\ k 



(6-15) 

(6.16) 

(6.17) 

(6.18) 

(6-19) 

(6.20) 

(6.21) 

(6.22) 

(6.23) 

(6.24) 

(6.25) 

(6.26) 

(6.27) 

(6.28) 

(6.29) 
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We can remember when to stick the (—1 ) n k factor into a formula like 
(6.12) because there’s a natural ordering of powers when x is large: 


x n > x n > x— , for all x > n > 1 . 


(6-30) 


The Stirling numbers [£] and {£} are nonnegative, so we have to use minus 
signs when expanding a “small” power in terms of “large” ones. 

We can plug (6.11) into (6.12) and get a double sum: 


x 


n 


L 


:-i 


) n - k x k 


= Z 

k,m 



"k" 

u) 

m 


(-1 


\n— k^m 


This holds for all x, so the coefficients of x°, x 1 , . . . , x n 1 , x n+1 , x n+2 , . . . 
on the right must all be zero and we must have the identity 

(— 1) n ~ k = [m = n] , integers m, n j> 0. (6.31) 



Stirling numbers, like binomial coefficients, satisfy many surprising iden- 
tities. But these identities aren’t as versatile as the ones we had in Chapter 5, 
so they aren’t applied nearly as often. Therefore it’s best for us just to list 
the simplest ones, for future reference when a tough Stirling nut needs to be 
cracked. Tables 264 and 265 contain the formulas that are most frequently 
useful; the principal identities we have already derived are repeated there. 

When we studied binomial coefficients in Chapter 5, we found that it 
was advantageous to define (£) for negative n in such a way that the identity 
(k) = (V) + (k-i) va lid without any restrictions. Using that identity to 
extend the (£)’s beyond those with combinatorial significance, we discovered 
(in Table 164) that Pascal’s triangle essentially reproduces itself in a rotated 
form when we extend it upward. Let’s try the same thing with Stirling’s 
triangles: What happens if we decide that the basic recurrences 


n 


= k 


n- 1 


n- 1 
k- 1 


n 

k 


(n 


D 


n — 1 
k 


n — 1 
k- 1 


are valid for all integers n and k? The solution becomes unique if we make 
the reasonable additional stipulations that 




= [k = 0] and 


= [n = 0] . 


(6-32) 
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Table 267 Stirling’s triangles in tandem. 



In fact, a surprisingly pretty pattern emerges: Stirling’s triangle for cycles 
appears above Stirling’s triangle for subsets, and vice versa! The two kinds 
of Stirling numbers are related by an extremely simple law [220, 221]: 

integers k, n. (6-33) 

We have “duality,” something like the relations between min and max, between 
[xj and [x] , between x— and x n , between gcd and 1cm. It’s easy to check that 
both of the recurrences [£] = (n-1 ) [ n k ’] + [£l'] and {£} = k{\ ] } + } 

amount to the same thing, under this correspondence. 

6.2 EULERIAN NUMBERS 

Another triangle of values pops up now and again, this one due to 
Euler [104, §13; 110, page 485], and we denote its elements by (£). The 
angle brackets in this case suggest “less than” and “greater than” signs; 
is the number of permutations 7X-| 7 x 2 . . . 7X n of {1 , 2, . . . , n} that have k ascents, 
namely, k places where 7Xj < 7Xj + i. (Caution: This notation is less standard 
(Knuth [209, first than our notations [”] , {£} for Stirling numbers. But we’ll see that it makes 
edition] used good sense.) 

ic+i ^ or k ■) For example, eleven permutations of {1 ,2,3,4} have two ascents: 

1324, 1423, 2314, 2413, 3412; 

1243, 1342, 2341; 2134, 3124, 4123. 

(The first row lists the permutations with 7 ti < 7 t 2 > 7 T 3 < 714 ; the second row 
lists those with 7ti < 7 T 2 < 713 > 7 T 4 and 7 ti > 7 T 2 < 7 T 3 < 7 X 4 .) Hence ('J) = 11. 
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Table 268 Euler’s triangle. 



Table 268 lists the smallest Eulerian numbers; notice that the trademark 
sequence is 1 , 11, 11, 1 this time. There can be at most n — 1 ascents, when 
n. > 0, so we have = [n = 0] on the diagonal of the triangle. 

Euler’s triangle, like Pascal’s, is symmetric between left and right. But 
in this case the symmetry law is slightly different: 




integer n > 0; 


(6-34) 


The permutation tc-j 7 T 2 . . . 7t n has n— 1 — k ascents if and only if its “reflection” 
7t n . . . 7t27Ti has k ascents. 

Let’s try to find a recurrence for (£). Each permutation p = Pi ... p n -i 
of {1 , . . . , n — 1} leads to n permutations of {1 , 2 , . . . , n} if we insert the new 
element n in all possible ways. Suppose we put n in position j, obtaining the 
permutation 7t = pi . . . Pj_i n Pj . . . p n -i . The number of ascents in n is the 
same as the number in p, if j = 1 or if Pj_i < Pj; it’s one greater than the 
number in p, if pj_i > pj or if j = n. Therefore 7t has k ascents in a total 
of (k+ ways from permutations p that have k ascents, plus a total 

of ((n — 2) — (k — 1 ) + l)(£zj) ways from permutations p that have k — 1 
ascents. The desired recurrence is 

(k) = ( k+ ^^k 1) + ( n_k )(k- l) ’ inte S ern>0 - ( 6 -35) 


Once again we start the recurrence off by setting 



[k = 0] , 


integer k, 


(6-36) 


and we will assume that (£) = 0 when k < 0. 
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Eulerian numbers are useful primarily because they provide an unusual 
connection between ordinary powers and consecutive binomial coefficients: 

integer n 0. (6.37) 


Western schol- 
ars have recently 
learned of a sig- 
nificant Chinese 
book by Li Shan- 
Lan [249; 265, pages 
320-325], published 
in 1867, which 
contains the first 
known appearance 
of formula ( 6 . 37 ). 

and so on. It’s easy to prove (6.37) by induction (exercise 14). 

Incidentally, (6.37) gives us yet another way to obtain the sum of the 
first n squares: We have k 2 = (a)(2) + ('GG-G) = (2) + G]) 1 )) k ence 

i 2 + 2 2 +■■■+ n 2 = (Q )+(D+-+(?))+((t)4(i)+-+rn) 

= GG) + GG) — g(n + 1 )n((n — 1 ) + (n + 2)) . 

The Eulerian recurrence (6.35) is a bit more complicated than the Stirling 
recurrences (6.3) and (6.8), so we don’t expect the numbers (£) to satisfy as 
many simple identities. Still, there are a few: 


(This is called “Worpitzky’s identity” [378].) For example, we have 




L 

k=0 


n+1 
k 


(m + 1 — k) n (— 1 p 


m! 





(6.38) 


(6-39) 


(6.40) 


If we multiply (6.39) by z n ~ m and sum on m, we get ]T m = 

Hk (k) ( z + 1) k - Replacing z by z — 1 and equating coefficients of z k gives 
(6.40). Thus the last two of these identities are essentially equivalent. The 
first identity, (6.38), gives us special values when m is small: 




2 n — n— 1 ; 



(n+1)2-+( n + 1 ). 
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Table 270 Second-order Euler ian triangle. 



We needn’t dwell further on Eulerian numbers here; it’s usually sufficient 
simply to know that they exist, and to have a list of basic identities to fall 
back on when the need arises. However, before we leave this topic, we should 
take note of yet another triangular pattern of coefficients, shown in Table 270. 
We call these “second-order Eulerian numbers” ((£)), because they satisfy a 
recurrence similar to (6.35) but with n replaced by 2n — 1 in one place: 

C» = i ’‘ +,) (( B ^» +(ai - , -«(C=i 1 »- (6 - 4i) 

These numbers have a curious combinatorial interpretation, first noticed by 
Gessel and Stanley [147]: If we form permutations of the multiset {1, 1,2,2, 
. . . , n, n} with the special property that all numbers between the two occur- 
rences of m are greater than m, for 1 ^ m ^ n, then ((£)) is the number of 
such permutations that have k ascents. For example, there are eight suitable 
single-ascent permutations of {1 , 1 , 2, 2, 3, 3}: 

113322, 133221, 221331, 221133, 223311, 233211, 331122, 331221. 

Thus = 8. The multiset {1 , 1 , 2, 2, . . . , n, n} has a total of 

^-((k)) = ( 2 n -' I H 2 n -3)...( 1 ) = (6.42) 

suitable permutations, because the two appearances of n. must be adjacent 
and there are 2n — 1 places to insert them within a permutation for n — 1 . 
For example, when n = 3 the permutation 1221 has five insertion points, 
yielding 331221, 133221, 123321, 122331, and 122133. Recurrence (6.41) can 
be proved by extending the argument we used for ordinary Eulerian numbers. 
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So 1 /x is a 
polynomial? 

(Sorry about that.) 


Second-order Eulerian numbers are important chiefly because of their 
connection with Stirling numbers [148]: We have, by induction on n, 



For example, 


integer 0; 


integer n 0. 


(6-43) 

(6.44) 



x 

x — 1 




X 

x— 2 






x 

x— 3 



(We already encountered the case n = 1 in (6.7).) These identities hold 
whenever x is an integer and n is a nonnegative integer. Since the right-hand 
sides axe polynomials in x, we can use (6.43) and (6.44) to define Stirling 
numbers { x / n } and [ * ] for arbitrary real (or complex) values of x. 

If n > 0, these polynomials { x / n } and [ x * n ] are zero when x = 0, x = 1, 
. . . , and x = n; therefore they are divisible by (x— 0), (x— 1 ), . . . , and (x — n). 
It’s interesting to look at what’s left after these known factors are divided out. 
We define the Stirling polynomials cr n (x) by the rule 


(XnM = 


/ (x(x — 1 ) ... (x — n)) . 


(6-45) 


(The degree of cr n (x) is n — 1 .) The first few cases are 


CTo(x) = 1/x; 

cti(x) = 1/2; 

02M = (3x— 1)/24; 
ct 3 (x) = (x 2 -x)/48; 

0-4 (x) = (15x 3 — 30x 2 + 5x + 2)/5760 . 

They can be computed via the second-order Eulerian numbers; for example, 

03 (x) = ((x— 4)(x— 5) + 8(x— 4)(x+l) + 6(x+2)(x+l))/6! . 
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Table 272 Stirling convolution formulas. 

n 

rs y~ o~k(r + tk) g n _ lc (s + t(n — k)) = (r + s)cr n (r + s + tn) 

k=0 

n 

s y kcr k (r + tk) cr n _ k (s + t(n - k)) = ncr n (r + s + tn) 


k=0 



(- 1 ) 


n— m+1 


n! 

(m — 1 )! ' 


-m 


n! 


(m-1)! 


ffn-m TX 


(6.46) 

(6-47) 

(6.48) 

(6.49) 


It turns out that these polynomials satisfy two very pretty identities: 

ze z 


e z — 1 


lm- 1 


z 1 — z 


= a nMz n ', 

n^O 

= xY cr n (x + n) z n . 

n>0 


And in general, if S t (z) is the power series that satisfies 

ln(l — zS t (z) t_1 ) = — zS t (z) t , 

then 

S t (z) x = x Y g n (x + tn)z n . 


(6-50) 

(6-51) 

(6.52) 

(6-53) 


n> 0 


Therefore we can obtain general convolution formulas for Stirling numbers, as 
we did for binomial coefficients in Table 202; the results appear in Table 272. 
When a sum of Stirling numbers doesn’t fit the identities of Table 264 or 265, 
Table 272 may be just the ticket. (An example appears later in this chapter, 
following equation (6.100). Exercise 7.19 discusses the general principles of 
convolutions based on identities like (6.50) and (6.53).) 


6.3 HARMONIC NUMBERS 

It’s time now to take a closer look at harmonic numbers, which we 
first met back in Chapter 2: 

11 1 n 1 

H n = 1 + - + -H h - = Y r> integer n )> 0. 

2 3 n ^ — k 

k=1 


(6-54) 
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This must be 
Table 273. 


These numbers appear so often in the analysis of algorithms that computer 
scientists need a special notation for them. We use H n , the ‘H’ standing for 
“harmonic,” since a tone of wavelength 1/n is called the nth harmonic of a 
tone whose wavelength is 1 . The first few values look like this: 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

H n 

0 

1 

3 

11 

25 

137 

49 

363 

761 

7129 

7381 

2 

6 

12 

60 

20 

140 

280 

2520 

2520 


Exercise 21 shows that H n is never an integer when n > 1. 

Here’s a card trick, based on an idea by R. T. Sharp [325], that illustrates 
how the harmonic numbers arise naturally in simple situations. Given n cards 
and a table, we’d like to create the largest possible overhang by stacking the 
cards up over the table’s edge, subject to the laws of gravity: 



To define the problem a bit more, we require the edges of the cards to be 
parallel to the edge of the table; otherwise we could increase the overhang by 
rotating the cards so that their corners stick out a little farther. And to make 
the answer simpler, we assume that each card is 2 units long. 

With one card, we get maximum overhang when its center of gravity is 
just above the edge of the table. The center of gravity is in the middle of the 
card, so we can create half a cardlength, or 1 unit, of overhang. 

With two cards, it’s not hard to convince ourselves that we get maximum 
overhang when the center of gravity of the top card is just above the edge 
of the second card, and the center of gravity of both cards combined is just 
above the edge of the table. The joint center of gravity of two cards will be 
in the middle of their common part, so we are able to achieve an additional 
half unit of overhang. 

This pattern suggests a general method, where we place cards so that the 
center of gravity of the top k cards lies just above the edge of the k + 1 st card 
(which supports those top k). The table plays the role of the ri+ 1st card. To 
express this condition algebraically, we can let dk be the distance from the 
extreme edge of the top card to the corresponding edge of the kth card from 
the top. Then di = 0, and we want to make dk+i the center of gravity of the 
first k cards: 
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(The center of gravity of k objects, having respective weights Wj, w k 
and having respective centers of gravity at positions pi , ... p k , is at position 
(wi pi + ■ ■ ■ + WfcPkj/fwi + • — b w k ).) We can rewrite this recurrence in two 
equivalent forms 

kdk+i = k + dj + • • ■ + d k -i + dk , k 0; 

(k — 1 )d k = k — 1 + di H b dk— i , k bj 1 . 

Subtracting these equations tells us that 

kdk+i — (k — 1 ) dk = 1 + d k , k^l; 

hence d k +i = d k + 1/k. The second card will be offset half a unit past the 
third, which is a third of a unit past the fourth, and so on. The general 
formula 

dk+i = H k ( 6 . 56 ) 


follows by induction, and if we set k = n we get d n+ i = H n as the total 
overhang when n cards are stacked as described. 

Could we achieve greater overhang by holding back, not pushing each 
card to an extreme position but storing up “potential gravitational energy” 
for a later advance? No; any well-balanced card placement has 


d k +i 


(1 + di) + (1 +d 2 ) + ••■ + (! + d k ) 
k 


1 < k < n. 


Furthermore di = 0. It follows by induction that d k +i ^5 H k . 

Notice that it doesn’t take too many cards for the top one to be com- 
pletely past the edge of the table. We need an overhang of more than one 
cardlength, which is 2 units. The first harmonic number to exceed 2 is 
H 4 = yf, so we need only four cards. 

And with 52 cards we have an Hsi-unit overhang, which turns out to be 
H 52/2 ss 2.27 cardlengths. (We will soon learn a formula that tells us how to 
compute an approximate value of H n for large n without adding up a whole 
bunch of fractions.) 

An amusing problem called the “worm on the rubber band” shows har- 
monic numbers in another guise. A slow but persistent worm, W, starts at 
one end of a meter-long rubber band and crawls one centimeter per minute 
toward the other end. At the end of each minute, an equally persistent keeper 
of the band, K, whose sole purpose in life is to frustrate W, stretches it one 
meter. Thus after one minute of crawling, W is 1 centimeter from the start 
and 99 from the finish; then K stretches it one meter. During the stretching 
operation W maintains his relative position, 1 % from the start and 99% from 


Anyone who actu- 
ally tries to achieve 
this maximum 
overhang with 52 
cards is probably 
not dealing with 
a full deck — or 
maybe he’s a real 
joker. 
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Metric units make 
this problem more 
scientific. 


A fiatworm, eh? 


the finish; so W is now 2 cm from the starting point and 198 cm from the 
goal. After W crawls for another minute the score is 3 cm traveled and 197 
to go; but K stretches, and the distances become 4.5 and 295.5. And so on. 
Does the worm ever reach the finish? He keeps moving, but the goal seems to 
move away even faster. (We’re assuming an infinite longevity for K and W, 
an infinite elasticity of the band, and an infinitely tiny worm.) 

Let’s write down some formulas. When K stretches the rubber band, the 
fraction of it that W has crawled stays the same. Thus he crawls 1 /100th of 
it the first minute, 1 /200th the second, 1 /300th the third, and so on. After 
n minutes the fraction of the band that he’s crawled is 


1/111 1\ H n 

lW + 2 + 3 + '" + n) “ 100' 


(6-57) 


So he reaches the finish if H n ever surpasses 100. 

We’ll see how to estimate H n for large n soon; for now, let’s simply 
check our analysis by considering how “Superworm” would perform in the 
same situation. Superworm, unlike W, can crawl 50 cm per minute; so she 
will crawl H n /2 of the band length after n minutes, according to the argument 
we just gave. If our reasoning is correct, Superworm should finish before n 
reaches 4, since H 4 > 2. And yes, a simple calculation shows that Superworm 
has only 33 1 cm left to travel after three minutes have elapsed. She finishes 
in 3 minutes and 40 seconds fiat. 

Harmonic numbers appear also in Stirling’s triangle. Let’s try to find 
a closed form for [™] , the number of permutations of n objects that have 
exactly two cycles. Recurrence ( 6 . 8 ) tells us that 


n +1 

2 


= n 


= n 


+ (n — 1)! , if n > 0; 


and this recurrence is a natural candidate for the summation factor technique 
of Chapter 2: 


1 

n + 1' 

1 

’n" 

nT 

2 

" (n-1)! 

2 


1 

n 


Unfolding this recurrence tells us that ^ = H n ; hence 


n +1 

2 


= n!H n . 


(6-58) 


We proved in Chapter 2 that the harmonic series 1/k diverges, which 
means that H n gets arbitrarily large as n — > 00. But our proof was indirect; 
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we found that a certain infinite sum ( 2 . 58 ) gave different answers when it was 
rearranged, hence ]T k 1 /k could not be bounded. The fact that H n — » 00 
seems counter-intuitive, because it implies among other things that a large 
enough stack of cards will overhang a table by a mile or more, and that the 
worm W will eventually reach the end of his rope. Let us therefore take a 
closer look at the size of H n when n is large. 

The simplest way to see that H n — » 00 is probably to group its terms 
according to powers of 2. We put one term into group 1, two terms into 
group 2, four into group 3, eight into group 4, and so on: 


1 11111111111111 

T + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + lT + 12 + 13 + T4 + 15 + 


group 1 group 2 


group 3 


group 4 


Both terms in group 2 are between \ and j, so the sum of that group is 
between 2 • ^ = j and 2-\ — 1 . All four terms in group 3 are between i 
and so their sum is also between | and 1. In fact, each of the 2 k_1 terms 
in group k is between 2 ~ k and 2 '~ k ; hence the sum of each individual group 
is between j and 1 . 

This grouping procedure tells us that if n is in group k, we must have 
H n > k/2 and H n ^ k (by induction on k). Thus H n — > oo, and in fact 

^ lgT ^ + 1 < H n T lg nj + 1 . ( 6 . 59 ) 

We now know H n within a factor of 2. Although the harmonic numbers 
approach infinity, they approach it only logarithmically — that is, quite slowly. 

Better bounds can be found with just a little more work and a dose 
of calculus. We learned in Chapter 2 that H n is the discrete analog of the 
continuous function Inn. The natural logarithm is defined as the area under 
a curve, so a geometric comparison is suggested: 



The area under the curve between 1 and n, which is dx/x = Inn, is less 
than the area of the n rectangles, which is 1 /k = H n . Thus Inn < H n ; 

this is a sharper result than we had in ( 6 . 59 ). And by placing the rectangles 


We should call them 
the worm numbers, 
they’re so slow. 
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“I now see a way too 
how y e aggregate 
of y e termes of 
Musicall progres- 
sions may bee found 
(much after y e 
same manner) by 
Logarithms, but 
y e calculations for 
finding out those 
rules would bee still 
more troublesom.” 

— I. Newton [280] 


a little differently, we get a similar upper bound: 



This time the area of the n rectangles, H n , is less than the area of the first 
rectangle plus the area under the curve. We have proved that 

Inn < H n < lnn+1, forn>l. (6.6o) 


We now know the value of H n with an error of at most 1 . 

( 2 ) 

“Second order” harmonic numbers H n arise when we sum the squares 
of the reciprocals, instead of summing simply the reciprocals: 


„i?' = ,+uu-+i = i:p- 


k=l 


Similarly, we define harmonic numbers of order r by summing (— r)th powers: 


H 


(r) 




k=l 


(6.6i) 


If r > 1 , these numbers approach a limit as n — > oo; we noted in exercise 2.31 
that this limit is conventionally called Riemann’s zeta function: 

C(r) = = X i • (6-62) 

k^l K 


Euler [103] discovered a neat way to use generalized harmonic numbers 
to approximate the ordinary ones, Hn 1 . Let’s consider the infinite series 


In 


k- 1 


1 


1 


1 


1 


2k 2 3k 3 4k 4 


+ 


(6-63) 


which converges when k > 1 . The left-hand side is In k — ln(k — 1 ] ; therefore 
if we sum both sides for 2^k{Jn the left-hand sum telescopes and we get 


^ /1_ J_ J_ J_ 
2- U + 2k 2 + 3k 3 + 4k 4 


k=2 


= (Hn-1) + + ^(H^-l) + l(Hn 4, -l) + 


(3) 




In n — In 1 
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Rearranging, we have an expression for the difference between H n and In n: 
H n — In n = . 


When n. — > oo, the right-hand side approaches the limiting value 

1 - 1(C(2)-1) - 1«(3)-1) - 1(C(4)-1) , 


which is now known as Euler ’s constant and conventionally denoted by the 
Greek letter y. In fact, C(r) — 1 is approximately 1/2 r , so this infinite series 
converges rather rapidly and we can compute the decimal value 

y = 0.5772156649... . ( 6 . 64 ) 

Euler’s argument establishes the limiting relation 


“Huius igitur quan- 
titatis constantis 
C valorem detex- 
imus, quippe est 
C = 0,577218.” 

— L. Euler [103] 


lim (H n — Inn) = y ; 

n— >00 


(6.65) 


thus H n lies about 58% of the way between the two extremes in ( 6 . 60 ). We 
are gradually homing in on its value. 

Further refinements are possible, as we will see in Chapter 9. We will 
prove, for example, that 

H„ = lan + y +^-T 2 ^ + ^4. »< £n <l. (6. 

This formula allows us to conclude that the millionth harmonic number is 


H 1000000 « 14.3927267228657236313811275, 

without adding up a million fractions. Among other things, this implies that 
a stack of a million cards can overhang the edge of a table by more than seven 
cardlengths. 

What does ( 6 . 66 ) tell us about the worm on the rubber band? Since H n is 
unbounded, the worm will definitely reach the end, when H n first exceeds 1 00. 

Our approximation to H n says that this will happen when n is approximately 

e 100-Y ^ g 99.423 

Well, they can’t 
really go at it this 
long ; the world will 
have ended much 
earlier, when the 
Tower of Brahma is 
fully transferred. 


In fact, exercise 9.49 proves that the critical value of n is either [e 100 ~ Y J or 
[e 100 — Y ]. We can imagine W’s triumph when he crosses the finish line at 
last, much to K’s chagrin, some 287 decillion centuries after his long crawl 
began. (The rubber band will have stretched to more than 1 0 27 light years 
long; its molecules will be pretty far apart.) 
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6.4 HARMONIC SUMMATION 

Now let’s look at some sums involving harmonic numbers, starting 
with a review of a few ideas we learned in Chapter 2 . We proved in (2.36) 
and (2.57) that 

Y_ H k = nH n -n; (6.67) 

0$;k<n 

X kHk = . (6.68) 

O^kcn 

Let’s be bold and take on a more general sum, which includes both of these 
as special cases: What is the value of 



when m is a nonnegative integer? 

The approach that worked best for (6.67) and (6.68) in Chapter 2 was 
called summation by parts. We wrote the summand in the form u(k)Av(k), 
and we applied the general identity 

u(x)Av(x)6x = u(x)v(x)| b — v(x + 1 )Au(x) 6x . (6.69) 

Remember? The sum that faces us now, 2 Lo<k<n (m)^ k > a na ti^ral for this 
method because we can let 
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Thus we have the answer we seek: 

L 0-( m n +1 )(h-^t)- <^o) 

(This checks nicely with (6.67) and (6.68) when m = 0 and m = 1 .) 

The next example sum uses division instead of multiplication: Let us try 
to evaluate 



If we expand by its definition, we obtain a double sum, 


S 


n 


L 

IsCj^ksCn 


1 

rv 


Now another method from Chapter 2 comes to our aid; equation (2.33) tells 
us that 


S 


n — 



1 

2 




(6.71) 


It turns out that we could also have obtained this answer in another way if 
we had tried to sum by parts (see exercise 26 ). 

Now let’s try our hands at a more difficult problem [ 354 ], which doesn’t 
submit to summation by parts: 

U n = Y ( n_k ) n > integer n ^ 1 . 

kM k ' 


(This sum doesn’t explicitly mention harmonic numbers either; but who 
knows when they might turn up?) 

We will solve this problem in two ways, one by grinding out the answer 
and the other by being clever and/or lucky. First, the grinder’s approach. We 
expand (n — k) n by the binomial theorem, so that the troublesome k in the 
denominator will combine with the numerator: 



This isn’t quite the mess it seems, because the k’ 1 in the inner sum is a 
polynomial in k, and identity (5.40) tells us that we are simply taking the 


(Not to give the 
answer away or 
anything.) 
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nth difference of this polynomial. Almost; first we must clean up a few things. 
For one, Id -1 isn’t a polynomial if j = 0 ; so we will need to split off that term 
and handle it separately. For another, we’re missing the term k = 0 from the 
formula for nth difference; that term is nonzero when j = 1 , so we had better 
restore it (and subtract it out again). The result is 



OK, now the top line (the only remaining double sum) is zero: It’s the sum 
of multiples of nth differences of polynomials of degree less than n, and such 
nth differences are zero. The second line is zero except when j = 1 , when it 
equals — n n . So the third line is the only residual difficulty; we have reduced 
the original problem to a much simpler sum: 

U n = n n (T n — 1 ) , where T n = Y ^ V — ■ ( 6 - 7 2 ) 

For example, U 3 = (?)f - ©2 = f ! T 3 = (,) j ~ ( 2 ) 2 + ( 3)3 = hence 
U 3 = 27 (T 3 — 1 ) as claimed. 

How can we evaluate T n ? One way is to replace (J)) by ("©) + (£-1), 
obtaining a simple recurrence for T n in terms of T, x ... i . But there’s a more 
instructive way: We had a similar formula in (5.41), namely 

y ( n \ f~1) k = n! 

\k/ x + k x(x + 1) . . . (x + n) ' 

If we subtract out the term for k = 0 and set x = 0 , we get — T n . So let’s do it: 


Tn = - - 


n! 


x x(x + 1 ) . . . (x + n 

(x + 1) . . . (x + n) -n! 
x(x + 1 ) . . . (x + n) 


v n -r n+1 l a 1- vT^ 1 ! 

X Ln+lJ + ^ 2 J 


x=0 


x=0 

rr 1 ] 


-n! s 


x(x + 1) . . . (x + n) 


\ 

1 

'n+r 

) 

* 

0 

1 

FLI 

2 
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(We have used the expansion (6.11) of (x + 1 ) . . . (x + n) = x n+1 /x; we can 
divide x out of the numerator because = n!.) But we know from (6.58) 

that [ U 2 ^ H n ; hence T n = H n , and we have the answer: 

U n = n n (H n - 1) . (6.73) 


That’s one approach, 
much more general sum, 


U n (x,y) = Y_ 


k>l 


The other approach will be to try to evaluate a 

(- I )*- 1 

(x + ky) n , integer n^O; (6.74) 


the value of the original U n will drop out as the special case U n (n, —1 ). (We 
are encouraged to try for more generality because the previous derivation 
“threw away” most of the details of the given problem; somehow those details 
must be irrelevant, because the nth difference wiped them away.) 

We could replay the previous derivation with small changes and discover 
the value of U n (x,y). Or we could replace (x + ky) n by (x + ky ) n_1 (x + ky ) 
and then replace (£) by ( n ^ 1 ) + (£”]), leading to the recurrence 

U n (x,y) = xli n _i (x,y ) + x n /n + yx n_1 ; (6.75) 

this can readily be solved with a summation factor (exercise 5). 

But it’s easiest to use another trick that worked to our advantage in 
Chapter 2: differentiation. The derivative of U n (x, y ) with respect to y brings 
out a k that cancels with the k in the denominator, and the resulting sum is 
trivial: 


— U n (x,y) = Y_ 


— 1 ) k ~'u(x + ky) n_1 

-1) k n(x-+ky) n_1 


nx n_1 - Y_ 

k>0 


= nx 


n— 1 


(Once again, the nth difference of a polynomial of degree < n has vanished.) 

We’ve proved that the derivative of U n (x,y) with respect to y is nx n_1 , 
independent of y. In general, if f'(y) = c then f (y ) = f ( 0) + cy; therefore we 
must have U n (x,y) = U n (x, 0) +nx n_1 y. 

The remaining task is to determine U n (x,0). But U n (x,0) is just x n 
times the sum T n = H n we’ve already considered in (6.72); therefore the 
general sum in (6.74) has the closed form 

U n (x,y) = x n H n + nx n ~'y . (6.76) 


In particular, the solution to the original problem is U n (n, — 1 ) = n n (H n — 1 ). 
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6.5 BERNOULLI NUMBERS 

The next important sequence of numbers on our agenda is named 
after Jakob Bernoulli (1654-1705), who discovered curious relationships while 
working out the formulas for sums of mth powers [26]. Let’s write 

n— 1 

S m (n) = 0 m + 1 m + --- + (n-1) m = Y_ km = Y^*™**- (6-77) 

k=0 


(Thus, when m > 0 we have S m (n) = n ( in the notation of generalized 
harmonic numbers.) Bernoulli looked at the following sequence of formulas 
and spotted a pattern: 


So(n) 

= 

n 







Si(n) 

= 

W 

-In 






S 2 (n) 

= 

K 


+ 





S3(n) 

= 

W 

— jTL 3 

+ 

\ nl 




S 4 (n) 

= 

K 

-In 4 

+ 

In 3 - 

3 IL 




S 5 (n) 

= 

In 6 

-In 5 

+ 

— n 4 — 
12 n 

12 n 2 



S 6 (n) 

= 

w 

- }n 6 

+ 

ln 5 - 

I- 3 

+ ^2 n 


S 7 (n) 

= 

In 8 

8 11 

-w 

+ 

T2 n6 — 

_L n 4 

24 rl 

+ 2-n 2 
-1- 12 n 


Ss(n) 

= 

in 9 

9 

-In 8 

+ 

I- 7 - 

— n 5 

15 11 

+ f n 3 - 

30 n 

S 9 (n) 

= 

_L n 10 
10 11 

-K 

+ 

-NU> 

00 

1 

— n 6 
10 11 

+ In 4 - 

A_n 2 

20 11 

Sio(n) 

= 

_L n ii 
n rl 

-In 10 

+ 

§n 9 - 

n 7 

+ n 5 — 

I™ 3 + i 14 


Can you see it too? The coefficient of n m+1 in S m (n) is always l/(m + 1). 
The coefficient of n m is always —1/2. The coefficient of n m_1 is always . . . 
let’s see ... m/12. The coefficient of n m ~ 2 is always zero. The coefficient 
of n m ~ 3 is always . . . let’s see . . . hmmm . . . yes, it’s — m/m— 1 )(m— 2)/720. 
The coefficient of n m ~ 4 is always zero. And it looks as if the pattern will 
continue, with the coefficient of n m ~ k always being some constant times m— . 

That was Bernoulli’s empirical discovery. (He did not give a proof.) In 
modern notation we write the coefficients in the form 


Sm(n) = 


1 


m + 1 


B 0 n m+1 


m + 1 

1 


B] n m + ■ • • + 


m + 1 
m 


Bm^ 


1 


m + 1 


L 

k=0 




(6-78) 
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Bernoulli numbers are defined by an implicit recurrence relation, 
'm+1' 


L 

3=0 


3 


Bj = [m = 0], for all m ^ 0. 


(6-79) 


For example, (q)Bq + ('f)Bi =0. The first few values turn out to be 


rt 

0 

1 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

B n 

1 

1 1 

2 6 

0 

-1 

30 

0 

1 

42 

0 

-1 

30 

0 

5 

66 

0 

-691 

2730 


(All conjectures about a simple closed form for B n are wiped out by the 
appearance of the strange fraction — 691/2730.) 

We can prove Bernoulli’s formula (6.78) by induction on m, using the 
perturbation method (one of the ways we found S2(n) — D n in Chapter 2): 


n — 1 


S m+1 (u)+n m+1 = ^ n< + 1) m+1 

/ m + 1 


k=0 

n— 1 m+1 

= 11 

k=0 j=0 


3 


m+1 

= y_ 

3=0 


m + 1 
3 


Sj (n) .(6.80) 


Let S m (n) be the right-hand side of (6.78); we wish to show that S m (n) = 
S m (n), assuming that Sj(n) = Sj(n) for 0 ^ j < m. We begin as we did for 
m = 2 in Chapter 2, subtracting S m+ i (n) from both sides of (6.80). Then we 
expand each Sj (n) using (6.78), and regroup so that the coefficients of powers 
of n on the right-hand side are brought together and simplified: 


n 


m+1 


L 

i=o 


m + 1 
3 


Sj(n) = 22 
i=o 


m + 1 
3 


m + 1 
m 


L 

i=o 


m + 1\ 1 


3 + 1 


L 

k=0 


3 + 1 

k 


Sj (n) 

B k n i+1 ~ k + (m + 1 ) A 


A 


£(V) 


A 


0+k+j + m 


L ( m , +1 )(;:0rfT" l+l + lm+ " A 

0+k+j + m v ' 7 v 7 ’ 


L 

0+k+j + m 


m+1 V, j + !^n k+1 +(m+1)A 


k+1 


y - — y 

k+ 1 


0+k+m 


k+ j + m 


k+V j + 1 

3 


Bj_ic + (m + 1 ) A 
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Here’s some more 
neat stuff that 
you’ll probably 
want to skim 
through the first 
time. 

— Friendly TA 


Start 

Skimming 


L 

n k+1 

k+ 1 

0<lk<lm 

L 

n k+1 

k+1 

0<lk<lm 

L 

n k+1 

k+1 

0<lk<lm 

n m+l 

/m + 

m + 1 

V 1X1 

n m+1 + (m-i 


m+1— k' 
-k 


Bi- 


ll 

,) L 

/ 0$Km-k v ; 7 

[m — k = 0] + (m + 1 ) A 
(m + 1 1 A 


(m + 1 ) A 


(m + 1 ) A 


(This derivation is a good review of the standard manipulations we learned 
in Chapter 5.) Thus A = 0 and S m (n) = S m (n), QED. 

In Chapter 7 we’ll use generating functions to obtain a much simpler 
proof of (6.78). The key idea will be to show that the Bernoulli numbers are 
the coefficients of the power series 




(6.81) 


Let’s simply assume for now that equation (6.81) holds, so that we can de- 
rive some of its amazing consequences. If we add jZ to both sides, thereby 
cancelling the term Bi z/1 ! = — \z from the right, we get 



z e z + 1 
2 e z — 1 


z e z / 2 + e~ z/2 
2 e z / 2 — e~ z / 2 



(6.82) 


Here coth is the “hyperbolic cotangent” function, otherwise known in calculus 
books as coshz/sinhz; we have 


6 — 6 6+6 

sinhz = ; coshz = . (6-83) 

Changing z to — z gives (^p) coth(^) = | coth hence every odd-numbered 
coefficient of ^ coth | must be zero, and we have 


B3 = B 5 = B 7 = B9 = Bn = B \ 3 = ••• = 0. (6-84) 

Furthermore (6.82) leads to a closed form for the coefficients of coth: 

,2n 


zcothz = 


2z 2z v -R ( 2z ) 2n y a rin 

- + - _ ^B 2n -]_4B 2r 

n^O v 1 n>0 


( 2 n)l 


(6-85) 


But there isn’t much of a market for hyperbolic functions; people are more 
interested in the “real” functions of trigonometry. We can express ordinary 
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trigonometric functions in terms of their hyperbolic cousins by using the rules 


sin z = — i sinh iz , 


cosz = coshiz; 


( 6 . 86 ) 


the corresponding power series are 


z 1 z 3 z 5 

sm * = TT ^ 3! + sT ~ • 

z» z 2 z 4 

C0SZ= 0 ! ^ 2 ! + 4 ! -'• 


smhz = fT+37 + 37 


z° z 2 


coshz = - + - + - + 


Hence cotz = cosz/sinz = i coshiz/ sinhiz = icothiz, and we have 


scot z = B 2l 


(2iz) 


2n 


(2n)! 


= ^(-4) n B 2r 


_2n 


(2n) 


(6.87) 


n^O n^O 

Another remarkable formula for zcotz was found by Euler (exercise 73): 


: cot z = 1 — 2 


k>l 


k 2 7t 2 — z 2 


( 6 . 88 ) 


We can expand Euler’s formula in powers of z , obtaining 

.2 ,4 ,6 


z cot z = 1 — 2 


L( 


k$5l 
2 


\k 2 7t 2 k 4 ?! 4 k 6 ?r 6 


+ • 


, 7U 7U 7t° 


Equating coefficients of z 2n with those in our other formula, (6.87), gives us 
an almost miraculous closed form for infinitely many infinite sums: 


C(2n) = = (-1 ) T 

For example, 


_ 1 2 2n - 1 7r 2n B 2 r 


integer n > 0. (6.89) 


C(2) = H^ =1+1 + 1 + - •• = 7t 2 B 2 = 7t 2 /6 ; 

C(4) = H^=l+ 1 l + g L+...= — 7 t 4 B 4 /3 = 7 t 4 / 90 . 


(6.90) 

(6-91) 


Formula (6.89) is not only a closed form for H^ n| , it also tells us the approx- 
imate size of B 2n , since H^ n ' is very near 1 when n is large. And it tells 
us that (— 1) n_1 B 2n > 0 for all n. > 0; thus the nonzero Bernoulli numbers 
alternate in sign. 


I see, we get “real” 
functions by using 
imaginary numbers. 
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Start 

Skipping 


And that’s not all. Bernoulli numbers also appear in the coefficients of 
the tangent function, 

tanz = — = X(- 1 ^ 1 4 n (4 n -l)B2 n ^-, (6.92) 

cosz z — 2u ! 

n^O v ’ 

as well as other trigonometric functions (exercise 72). Formula (6.92) leads 
to another important fact about the Bernoulli numbers, namely that 

4 n (4 n — i ) 

T 2 n-i = y~ ~^ 2n i s a positive integer. (6.93) 

We have, for example: 


n 

1 

3 

5 

7 

9 

11 

13 

Tn 

1 

2 

16 

272 

7936 

353792 

22368256 


(The T’s are called tangent numbers.) 

One way to prove (6.93), following an idea of B. F. Logan, is to consider 
the power series 


sin z + x cos z 
cos z — x sin z 


z 2 z 3 

x+ (1+x 2 )z+ (2x 3 +2x) — + (6x 4 +8x 2 +2) — 

z 6 

!><*>£. 

n^O 


(6.94) 


When x = tanw, 
this is tan(z + w) . 
Hence, by Taylor’s 
theorem, the nth 
derivative of tan w 
is T n (tan w) . 


where T n (x) is a polynomial in x; setting x = 0 gives T n (0) = T n , the nth 
tangent number. If we differentiate (6.94) with respect to x, we get 


cos z — x sin z 


= 


n>0 


but if we differentiate with respect to z, we get 


1 


_n— 1 


cosz — xsmz 


= L.t n(*>|— nr = 


n> 1 


n>0 


(Try it — the cancellation is very pretty.) Therefore we have 

Tn+i(x) = (1 +x 2 )T;(x), To(x) = X, 


(6-95) 


a simple recurrence from which it follows that the coefficients of T n (x) are 
nonnegative integers. Moreover, we can easily prove that T n (x) has degree 
n + 1 , and that its coefficients are alternately zero and positive. Therefore 
T2n+i (0) = T2 n +i is a positive integer, as claimed in (6.93). 
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Recurrence ( 6 . 95 ) gives us a simple way to calculate Bernoulli numbers, 
via tangent numbers, using only simple operations on integers; by contrast, 
the defining recurrence ( 6 . 79 ) involves difficult arithmetic with fractions. 

If we want to compute the sum of nth powers from a to b — 1 instead of 
from 0 to n — 1 , the theory of Chapter 2 tells us that 

b -i b 

Y k m = Y_ x m 5x = S m (b)-S m (a). ( 6 . 96 ) 

k=a 

This identity has interesting consequences when we consider negative values 
of k: We have 

-1 n — 1 

Y k m = (— 1 ) m Y_ ’ when m > 0 , 

k=-n+1 k=0 

hence 

S m (0)-S m (-n+1) = (— 1 ) m (S m (n) — S m (0)) . 

But S m (0) = 0, so we have the identity 

SmO-n) = (— 1 ) m+1 S m (n) , m>0. ( 6 . 97 ) 

Therefore S m (l) = 0. If we write the polynomial S m (n) in factored form, it 
will always have the factors n and (n — 1 ), because it has the roots 0 and 1 . In 
general, S m (n) is a polynomial of degree m+ 1 with leading term m 1 _ 1 n m+1 . 
Moreover, we can set n = 1 in ( 6 . 97 ) to deduce that SnJj) = (— 1 ) m+1 S m ( ^ ) ! 
if m is even, this makes Smt/) = 0 , so (n — j) will be an additional factor. 
These observations explain why we found the simple factorization 

S 2 (n) = jn(n — y)(n- 1) 

in Chapter 2; we could have used such reasoning to deduce the value of S 2 (n.) 
without calculating it! Furthermore, ( 6 . 97 ) implies that the polynomial with 
the remaining factors, S m (n) = S m (n)/(n— j), always satisfies 

S m (1 -n) = S m (n) , m even, m > 0. 


It follows that S m (n) can always be written in the factored form 


r m/21 


S m (n) = 


m + 


n- 


-J | I I L / | 

, 1 ]/[ (n- \ - a k )(n- \ + a k ) , m odd; 


k =1 


1 ^ m/2 


(6.98) 


m + 


n ( n _ 2 _ a k)(n - \ + cx k ) , m even. 


k=l 


Johann Faulhaber 
implicitly used 
(6.97) in 1635 [119] 
to find simple for- 
mulas for S m (n) 
as polynomials in 
n(n + 1 )/2 when 
m ^ 1 7 ; see [222].) 
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Here ai = j, and 0 C 2 , a[ m / 2 l are appropriate complex numbers whose 
values depend on m. For example, 


Stop 

Skipping 


S 3 (n) = n 2 (n-l) 2 /4; 

S 4 (n) = n(n-l)(n-l)(n-l+ y77T2)(n-l-^/77T2)/5; 

S 5 (n) = n 2 (n- l) 2 (n - \ + y/JjA)[n- \ - ^/3/4)/6', 

S 6 (n) = u(n— 2 )(n— 1)(n— j + a)(n— \ - ot)[n—j + a)(n— j — a) , 

where a = 2- 5 / 2 1^/ 2 31 ’/ 4 (\/\/3T + y/27 + i - VT7) . 

If m is odd and greater than 1, we have B m = 0; hence S m (n) is divisible 
by n 2 (and by (n — 1 ) 2 ). Otherwise the roots of S m (n) don’t seem to obey a 
simple law. 

Let’s conclude our study of Bernoulli numbers by looking at how they 
relate to Stirling numbers. One way to compute S m (n) is to change ordinary 
powers to falling powers, since the falling powers have easy sums. After doing 
those easy sums we can convert back to ordinary powers: 


n — 1 


n— 1 


Sm (n) = km = 


k=0 


k=0 j^O k J j^O 


ra 


j+i 


n— 1 




k=0 


i + 1 

k 


n 


Therefore, equating coefficients with those in ( 6 . 78 ), we must have the identity 


L 

j^O 


frul 

■j + v 


in 

k 

j + 1 m + 1 


m + 1 
k 


^m+l — k ■ 


(6-99) 


It would be nice to prove this relation directly, thereby discovering Bernoulli 
numbers in a new way. But the identities in Tables 264 or 265 don’t give 
us any obvious handle on a proof by induction that the left-hand sum in 
( 6 . 99 ) is a constant times mJ^-. If k = m + 1, the left-hand sum is just 
{ml [m+il / (nx+l ) = 1/(m+1), so that case is easy. And if k = m, the left- 
hand side sums to = 

— so that case is pretty easy too. But if k < m, the left-hand sum looks 
hairy. Bernoulli would probably not have discovered his numbers if he had 
taken this route. 
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One thing we can do is replace -fT 1 } by {Fy ^ } — (j + 1 ) }. The (j + 1 ) 

nicely cancels with the awkward denominator, and the left-hand side becomes 


z 


I'm + 11 

'i + T 


U + 1 J 

k 

i + i 


z 



(- 1 ) i+1 - k . 


The second sum is zero, when k < m, by (6.31). That leaves us with the first 
sum, which cries out for a change in notation; let’s rename all variables so 
that the index of summation is k, and so that the other parameters are m 
and n. Then identity (6.99) is equivalent to 



k 


1 

n 



+ [m = n 


1 ]. 


(6.100) 


Good, we have something that looks more pleasant — although Table 265 still 
doesn’t suggest any obvious next step. 

The convolution formulas in Table 272 now come to the rescue. We can 
use (6.49) and (6.48) to rewrite the summand in terms of Stirling polynomials: 



'k' 


m 



'k' 

(_1)k-m 

w 

m 

k 


(-D 

(-D 


n-k+1 n! 

Oc-D! 

ril 

n+1— m u 

(m— 1 


ffn-kf-k) 


k! 

7 TTrO'k— m 

(m— 1 )! 


-k(-k) CTk — m (k) 




Things are looking up; the convolution in (6.46), with t = 1 , yields 


n n— m 

Z ff n-k(-k) o- k _ m (k) = Y Q- n -m-k(-n+ (n-m-k)) (J k (m + k) 
k=0 k=0 

m — n , . . N 

= 7 — 77 rff n -m(m-n+ n-m ) . 

(ra)(— n) v y 

Formula (6. 100) is now verified, and we find that Bernoulli numbers are related 
to the constant terms in the Stirling polynomials: 

^7 = -mcj m (0). (6.101) 

m 1 


Stop 

v Skimming 


6.6 FIBONACCI NUMBERS 

Now we come to a special sequence of numbers that is perhaps the 
most pleasant of all, the Fibonacci sequence (F n ): 


n 

0 

1 

2 3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

Fn 

0 

1 

1 2 

3 

5 

8 

13 

21 

34 

55 

89 

144 

233 

3 77 
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Unlike the harmonic numbers and the Bernoulli numbers, the Fibonacci num- 
bers are nice simple integers. They are defined by the recurrence 


F 0 = 0; 

Fi = 1; 

F n = F n _i+F n _2, for n > 1 . (6.102) 


The back-to-nature 
nature of this ex- 
ample is shocking. 
This book should be 
banned. 


The simplicity of this rule — the simplest possible recurrence in which each 
number depends on the previous two — accounts for the fact that Fibonacci 
numbers occur in a wide variety of situations. 

“Bee trees” provide a good example of how Fibonacci numbers can arise 
naturally. Let’s consider the pedigree of a male bee. Each male (also known 
as a drone) is produced asexually from a female (also known as a queen); each 
female, however, has two parents, a male and a female. Here are the first few 
levels of the tree: 



Phy Hot axis, n. 
The love of taxis. 


The drone has one grandfather and one grandmother; he has one great- 
grandfather and two great-grandmothers; he has two great-great-grandfathers 
and three great-great-grandmothers. In general, it is easy to see by induction 
that he has exactly F n+ i great n -grandpas and F n+ 2 great n -grandmas. 

Fibonacci numbers are often found in nature, perhaps for reasons similar 
to the bee-tree law. For example, a typical sunflower has a large head that 
contains spirals of tightly packed florets, usually with 34 winding in one di- 
rection and 55 in another. Smaller heads will have 21 and 34, or 13 and 21; 
a gigantic sunflower with 89 and 144 spirals was once exhibited in England. 
Similar patterns are found in some species of pine cones. 

And here’s an example of a different nature [277]: Suppose we put two 
panes of glass back-to-back. How many ways a n are there for light rays to 
pass through or be reflected after changing direction n times? The first few 
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a 0 = 1 a-[=2 a.2=3 a 3 =5 


When n is even, we have an even number of bounces and the ray passes 
through; when n is odd, the ray is reflected and it re-emerges on the same 
side it entered. The a n ’s seem to be Fibonacci numbers, and a little staring 
at the figure tells us why: For n 2, the n-bounce rays either take their first 
bounce off the opposite surface and continue in a n _i ways, or they begin 
by bouncing off the middle surface and then bouncing back again to finish 
in a n _2 ways. Thus we have the Fibonacci recurrence a n = a n _i + a n _ 2 - 
The initial conditions are different, but not very different, because we have 
ao = 1 = F 2 and qi = 2 = F 3 ; therefore everything is simply shifted two 
places, and a n = F n+ 2 - 

Leonardo Fibonacci introduced these numbers in 1202, and mathemati- 
cians gradually began to discover more and more interesting things about 
them. Edouard Lucas, the perpetrator of the Tower of Hanoi puzzle dis- 
cussed in Chapter 1, worked with them extensively in the last half of the nine- 
teenth century (in fact it was Lucas who popularized the name “Fibonacci 
numbers”). One of his amazing results was to use properties of Fibonacci 
numbers to prove that the 39-digit Mersenne number 2 127 — 1 is prime. 

One of the oldest theorems about Fibonacci numbers, due to the French 
astronomer Jean- Dominique Cassini in 1680 [51], is the identity 

Fn+i F n _i — F 2 = (— 1) n , for n > 0 . ( 6 . 103 ) 

When n = 6 , for example, Cassini’s identity correctly claims that 13-5 — 8 2 = 

1 . 

A polynomial formula that involves Fibonacci numbers of the form F n ±ic 
for small values of k can be transformed into a formula that involves only F n 
and F n+ i , because we can use the rule 

Fm = F rrt-l-2 Frrv+1 ( 6 .IO 4 ) 

to express F m in terms of higher Fibonacci numbers when m < n, and we can 
use 

Fm = F m _2 ~F F m _i (6.105) 

to replace F m by lower Fibonacci numbers when m > n+1 . Thus, for example, 
we can replace F n _i by F n+ i — F n in ( 6 . 103 ) to get Cassini’s identity in the 


“La suite de Fi- 
bonacci possede 
des proprietes 
nombreuses fort 
interessantes.” 

— E. Lucas [259] 
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form 


Fn+i — Fn+i F n — Fn = 1) n - (6.106) 

Moreover, Cassini’s identity reads 

Fn+2F n -F£ +1 = (-1) n+1 

when n is replaced by n + 1; this is the same as (F n +i + F n )F n — F^ +1 = 
(—1 ) n+1 , which is the same as (6.106). Thus Cassini(n) is true if and only if 
Cassini(n+1) is true; equation (6.103) holds for all n by induction. 

Cassini’s identity is the basis of a geometrical paradox that was one of 
Lewis Carroll’s favorite puzzles [63], [319], [364]. The idea is to take a chess- 
board and cut it into four pieces as shown here, then to reassemble the pieces 
into a rectangle: 



The paradox is 
explained be- 
cause . . . well, 
magic tricks aren ’t 
supposed to be 
explained. 


Presto: The original area of 8 x 8 = 64 squares has been rearranged to yield 
5 x 13 = 65 squares! A similar construction dissects any F n x F n square 
into four pieces, using F n+ i, F n , F n _i, and F n _2 as dimensions wherever the 
illustration has 13, 8, 5, and 3 respectively. The result is an F n _i x F n +i 
rectangle; by (6.103), one square has therefore been gained or lost, depending 
on whether n is even or odd. 

Strictly speaking, we can’t apply the reduction (6.105) unless m 5s 2, 
because we haven’t defined F n for negative n. A lot of maneuvering becomes 
easier if we eliminate this boundary condition and use (6.104) and (6.105) to 
define Fibonacci numbers with negative indices. For example, F_i turns out 
to be F, -F 0 = 1; then F_2 is Fq — F_i = — 1. In this way we deduce the 
values 


n 

0 

-1 

-2 -3 -4 

-5 

-6 

-7 

-8 

-9 

-10 

-11 

Fn 

0 

1 

-1 2 -3 

5 

-8 

13 

-21 

34 

-55 

89 


and it quickly becomes clear (by induction) that 

F— n = ( — 1 ) T1 1 F n, , integer n. (6.107) 


Cassini’s identity (6.103) is true for all integers n, not just for n > 0, when 
we extend the Fibonacci sequence in this way. 
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The process of reducing F n ± k to a combination of F n and F n +i by using 
(6.105) and (6.104) leads to the sequence of formulas 


: n+2 

— F n+ i 

+ F n 

F n — 1 = 

Fn +1 

- F 

rt +3 

= 2F n+1 

+ F n 

Fn— 2 = 

^F n +1 

+ 2F 

rt +4 

= 3F n+ i 

+ 2F n 

Fn— 3 = 

2F n+l 

— 3F 

'n+5 

— 5F n _|_i 

+ 3F n 

F n— 4 = 

— 3F n +i +5F 


in which another pattern becomes obvious: 

F n +k = F k F n+ i + F k _!F n . ( 6 . 108 ) 

This identity, easily proved by induction, holds for all integers k and n (pos- 
itive, negative, or zero). 

If we set k = n in (6.108), we find that 

F2n — F n F n+1 “F F n 1 F n J (6.IO9) 

hence F2 n is a multiple of F n . Similarly, 

F3n — F2nFn+l "F F2n— 1 F n , 

and we may conclude that F3 n is also a multiple of F n . By induction, 

F kn is a multiple of F n , (6.110) 

for all integers k and n. This explains, for example, why F15 (which equals 
61 0 ) is a multiple of both F3 and F5 (which are equal to 2 and 5 ). Even more 
is true, in fact; exercise 27 proves that 

gcd(F m ,F n ) = F gcd ( m n ) . (6.111) 

For example, gcdfFi?, Fis) = gcd( 144 , 2584 ) = 8 = ¥&. 

We can now prove a converse of (6.110): If n > 2 and if F m is a multiple 
of F n , then m is a multiple of n. For if F n \F m then F n \gcd(F m , F n ) == 
^gcd(m,n} Sfc F n . This is possible only if F gcd ( mn ) = F n ; and our assumption 
that n > 2 makes it mandatory that gcd(m, n) = n. Hence n\m. 

An extension of these divisibility ideas was used by Yuri Matijasevich in 
his famous proof [ 266 ] that there is no algorithm to decide if a given multivari- 
ate polynomial equation with integer coefficients has a solution in integers. 
Matijasevich’s lemma states that, if n > 2 , the Fibonacci number F m is a 
multiple of F^ if and only if m is a multiple of nF n . 

Let’s prove this by looking at the sequence (F kn mod F„) for k = 1 , 2 , 
3 , . . . , and seeing when F kn mod F^ = 0 . (We know that m must have the 



6.6 FIBONACCI NUMBERS 295 


form kn if F m mod F n = 0.) First we have F n mod F 2 = F n ; that’s not zero. 
Next we have 

J" 2 n = ~F F~ n — i F n = 2F n F n _|_i (mod F n ) , 

by (6.108), since F n+ i = F n _i (mod F n ). Similarly 

F2n+1 = Fn + 1 + bn = F n+1 ( mod F n) • 

This congruence allows us to compute 


b 3n — b 2n+ i F n + F 2n F n _i 


— b 2 +1 b n + (2F n F n+1 )F n+1 = 3b 2 +1 F n 

(mod F 2 

b3n+l = b 2n+ i F n+ 1 + F 2n F n 


= b n+ i + (2F n F n+ i )b n = b^ + i 

(mod F 2 


In general, we find by induction on k that 

bkn = bFnF^j and F kn+1 = F£ +1 (mod F 2 ) . 

Now F n+ i is relatively prime to F n , so 

F kn = 0 (mod F 2 ) +=+ kF n = 0 (mod F 2 ) 

<t=+ k = 0 (mod F n ) . 

We have proved Matijasevich’s lemma. 

One of the most important properties of the Fibonacci numbers is the 
special way in which they can be used to represent integers. Let’s write 

j » k <+=4> j ^ k + 2. (6.112) 

Then every positive integer has a unique representation of the form 

n = F kl + F k , + --- + F kr , k] > k 2 > •• • > k r > 0. (6.113) 

(This is “Zeckendorf’s theorem” [246], [381].) For example, the representation 
of one million turns out to be 

1000000 = 832040 + 121393 + 46368 + 144+ 55 
= b 30 + b 2 6 + b 2 4 +bi 2 +Fio- 

We can always find such a representation by using a “greedy” approach, 
choosing F kl to be the largest Fibonacci number n, then choosing F kl 
to be the largest that is ^ n — F k| , and so on. (More precisely, suppose that 
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Fk ^ n < Fk + i; then we have 0 ^ n — F^ < F^+i — F^ = F^-i- If n is a 
Fibonacci number, (6.113) holds with r = 1 and k] = k. Otherwise n — F^ 
has a Fibonacci representation F lc2 + ■ • • + F kr , by induction on n; and (6.113) 
holds if we set ki = k, because the inequalities F^., ^ n — F^ < Fi<_i imply 
that k k2.) Conversely, any representation of the form (6.113) implies that 

Fk, ^ n < Fk, +1 , 

because the largest possible value of Fk 2 + • ■ ■ + Fk r when k k2 
k r » 0 is 


Fk-2 + Fk-4 + • • • + F kmod 2+2 — F ic i — 1 , if k ^ 2. (6.114) 

(This formula is easy to prove by induction on k; the left-hand side is zero 
when k is 2 or 3.) Therefore ki is the greedily chosen value described earlier, 
and the representation must be unique. 

Any unique system of representation is a number system; therefore Zeck- 
endorf’s theorem leads to the Fibonacci number system. We can represent 
any nonnegative integer n as a sequence of 0’s and 1 ’s, writing 

m 

n = (b m b m _! . . ,b 2 )F n = ^ b k F k . (6.115) 

k=2 

This number system is something like binary (radix 2) notation, except that 
there never are two adjacent l’s. For example, here are the numbers from 1 
to 20, expressed Fibonacci- wise: 


1 = (000001 )f 

2 = (000010) F 

3 = (000100) F 

4 = (000101 ) F 

5 = (001000 ) f 


6 = (001001 ) F 
7= (001010)f 

8 = (010000)f 

9 = (010001 ) F 

10 = (010010)f 


11 = (010100)f 

12 = (010101 ) F 

13 = (100000)f 

14 = (100001 ) F 

15 = (100010)f 


16 = (100100)f 
17= (100101 ) f 

18 = (101000)f 

19 = (101001 ) F 

20= (101010)f 


The Fibonacci representation of a million, shown a minute ago, can be con- 
trasted with its binary representation 2 19 +2 18 +2 17 + 2 16 +2 14 +2 9 +2 6 : 

(IOOOOOO)fo = (10001010000000000010100000000)f 
= (11110100001001000000)2. 


The Fibonacci representation needs a few more bits because adjacent 1’s are 
not permitted; but the two representations are analogous. 

To add 1 in the Fibonacci number system, there are two cases: If the 
“units digit” is 0, we change it to 1; that adds F2 = 1, since the units digit 
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“Sit 1 + x + 2xx + 
3x 3 + 5x 4 + 8x 5 + 
13x 6 + 21 x 7 + 
34x 8 &c Series nata 
ex divisione Unitatis 
per Trinomium 
1 — x — xx.” 

— A. de Moivre [76] 

“The quantities r, 
s, t, which show 
the relation of 
the terms, are the 
same as those in 
the denominator of 
the fraction. This 
property, howsoever 
obvious it may 
be, M. DeMoivre 
was the first that 
applied it to use, 
in the solution of 
problems about 
infinite series, which 
otherwise would 
have been very 
intricate.” 

— J. Stirling [343] 


refers to F2. Otherwise the two least significant digits will be 01, and we 
change them to 10 (thereby adding F3 — F2 = 1). Finally, we must “carry” as 
much as necessary by changing the digit pattern ‘01 1’ to ‘100’ until there are 
no two 1’s in a row. (This carry rule is equivalent to replacing F m+ i + F m 
by F m+ 2-) For example, to go from 5 = (1000 )f to 6 = (1001 )p or from 
6 = (1001 )f to 7 = ( 1 01 0)f requires no carrying; but to go from 7 = (1010)p 
to 8 = (10000 )f we must carry twice. 

So far we’ve been discussing lots of properties of the Fibonacci numbers, 
but we haven’t come up with a closed formula for them. We haven’t found 
closed forms for Stirling numbers, Eulerian numbers, or Bernoulli numbers 
either; but we were able to discover the closed form H n = [ n ^ 1 ]/n! for 
harmonic numbers. Is there a relation between F n and other quantities we 
know? Can we “solve” the recurrence that defines F n ? 

The answer is yes. In fact, there’s a simple way to solve the recurrence by 
using the idea of generating function that we looked at briefly in Chapter 5. 
Let’s consider the infinite series 

F(z) = F 0 + Fiz+F2Z 2 H = ^ F n z n . (6.116) 

n^O 

If we can find a simple formula for F(z), chances are reasonably good that we 
can find a simple formula for its coefficients F n . 

In Chapter 7 we will focus on generating functions in detail, but it will 
be helpful to have this example under our belts by the time we get there. 
The power series F(z) has a nice property if we look at what happens when 
we multiply it by z and by z 2 : 

F(z) = Fo + Fiz + F2Z^ + F3Z 3 + F4Z 4 + F5Z 5 + ■ • • , 

zF(z) = F()Z + F-|Z^ + F2Z 2 +F3Z 4 + F4Z 5 + -- -, 

z 2 F(z) = F 0 z 2 + F,z 3 + F 2 z 4 + F 3 z 5 + ■ ■ ■ . 

If we now subtract the last two equations from the first, the terms that involve 
z 2 , z 3 , and higher powers of z will all disappear, because of the Fibonacci 
recurrence. Furthermore the constant term Fo never actually appeared in the 
first place, because Fo = 0. Therefore all that’s left after the subtraction is 
(Fi — Fo)z, which is just z. In other words, 

F(z) — zF(z) — z 2 F(z) = z, 

and solving for F(z) gives us the compact formula 



(6.117) 
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We have now boiled down all the information in the Fibonacci sequence 
to a simple (although unrecognizable) expression z/(1 — z — z 2 ). This, believe 
it or not, is progress, because we can factor the denominator and then use 
partial fractions to achieve a formula that we can easily expand in power series. 
The coefficients in this power series will be a closed form for the Fibonacci 
numbers. 

The plan of attack just sketched can perhaps be understood better if 
we approach it backwards. If we have a simpler generating function, say 
1/(1 — az) where a is a constant, we know the coefficients of all powers of z, 
because 

- — - — = 1 + az + a 2 z 2 + a 3 z 3 + • • • . 

1 — az 

Similarly, if we have a generating function of the form A/(l — az) +B/(1 — |3z), 
the coefficients are easily determined, because 


r ^ + r \- = A^Jazr + B^JISzr 

1 — ocz 1 — pz L — L — 

n />0 n />0 

= ^(Aa n + B|3 n )z n . ( 6 . 118 ) 

n .^0 

Therefore all we have to do is find constants A, B, a, and (3 such that 

A B _ z 

1 — az 1 — (3z 1 — z — z 2 

and we will have found a closed form Aa n + B|3 n for the coefficient F n of z n 
in F(z). The left-hand side can be rewritten 

A B A — A|3z + B — Baz 

1 — az 1 — (3z (1 — az)(l — |3z) 

so the four constants we seek are the solutions to two polynomial equations: 


(1 — az)(l — (3z) = 1— z — z 2 ; ( 6 . 119 ) 

(A + B) — (A(3 + Ba)z = z. ( 6 . 120 ) 

We want to factor the denominator of F(z) into the form (1 — az)(l — |3z); 
then we will be able to express F(z) as the sum of two fractions in which the 
factors (1 — az) and (1 — |3z) are conveniently separated from each other. 

Notice that the denominator factors in ( 6 . 119 ) have been written in the 
form (1 — az)(l — |3z), instead of the more usual form c(z— pi )(z— P2) where 
Pi and P 2 are the roots. The reason is that (1 — az)(l — |3z) leads to nicer 
expansions in power series. 
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As usual, the au- 
thors can’t resist 
a trick. 


We can find a and |3 in several ways, one of which uses a slick trick: Let 
us introduce a new variable w and try to find the factorization 

w 2 — wz — z 2 = (w — ocz ) (w — | 3 z) . 


Then we can simply set w = 1 and we’ll have the factors of 1 — z — z 2 . The 
roots of w 2 — wz — z 2 = 0 can be found by the quadratic formula; they are 

z ± \/z 2 + 4 z 2 1 ± a /5 

7 7 — z . 


Therefore 


2 2 
W — WZ — Z = 


W 


1 +V5 


1 - V5 


Tlie ratio of one’s 
height to the height 
of one’s navel is 
approximately 
1 .61 8, accord- 
ing to extensive 
empirical observa- 
tions by European 
scholars [136], 


and we have the constants oc and (3 we were looking for. 

The number ( 1 + v 5 )/2 « 1.61 803 is important in many parts of mathe- 
matics as well as in the art world, where it has been considered since ancient 
times to be the most pleasing ratio for many kinds of design. Therefore it 
has a special name, the golden ratio. We denote it by the Greek letter 4 >, in 
honor of Phidias who is said to have used it consciously in his sculpture. The 
other root (1 — \/ 5)/2 = — l/cf) ss —.61803 shares many properties of <£>, so it 
has the special name $, “phi hat.” These numbers are roots of the equation 
w 2 — w — 1 = 0, so we have 


4> 2 = 4 > + 1 ; $ 2 = $ + l . 


(6.121) 


(More about <(> and $ later.) 

We have found the constants a. = cf> and (3 = $ needed in (6.119); now 
we merely need to find A and B in (6.120). Setting z = 0 in that equation 
tells us that B = —A, so (6.120) boils down to 


-$A + c(iA = 1 . 

The solution is A = 1 /(<t> — $) = l/v 5 ; the partial fraction expansion of 
(6.117) is therefore 

F(z) = A=(, — ^-7 7 — (6.122) 

V5V1-43Z 1-$z/ 

Good, we’ve got F(z) right where we want it. Expanding the fractions into 
power series as in (6.118) gives a closed form for the coefficient of z n : 

Fn = ^ n -$ n ). (6.123) 

(This formula was first published by Leonhard Euler [ 113 ] in 1765 , but people 
forgot about it until it was rediscovered by Jacques Binet [ 31 ] in 1843 .) 
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Before we stop to marvel at our derivation, we should check its accuracy. 
For n = 0 the formula correctly gives Fq = 0 ; for n = 1 , it gives Fj = 
(cf> — $)/V 5 , which is indeed 1 . For higher powers, equations (6.121) show 
that the numbers defined by (6.123) satisfy the Fibonacci recurrence, so they 
must be the Fibonacci numbers by induction. (We could also expand 4 > n 
and $ n by the binomial theorem and chase down the various powers of y/ 5 \ 
but that gets pretty messy. The point of a closed form is not necessarily to 
provide us with a fast method of calculation, but rather to tell us how F n 
relates to other quantities in mathematics.) 

With a little clairvoyance we could simply have guessed formula (6.123) 
and proved it by induction. But the method of generating functions is a pow- 
erful way to discover it; in Chapter 7 we’ll see that the same method leads us 
to the solution of recurrences that are considerably more difficult. Inciden- 
tally, we never worried about whether the infinite sums in our derivation of 
(6.123) were convergent; it turns out that most operations on the coefficients 
of power series can be justified rigorously whether or not the sums actually 
converge [ 182 ]. Still, skeptical readers who suspect fallacious reasoning with 
infinite sums can take comfort in the fact that equation (6.123), once found 
by using infinite series, can be verified by a solid induction proof. 

One of the interesting consequences of (6.123) is that the integer F n is 
extremely close to the irrational number 4> n /\/5 when n is large. (Since $ is 
less than 1 in absolute value, $ n becomes exponentially small and its effect 
is almost negligible.) For example, F10 =55 and Fi 1 =89 are very near 

cb 10 cb 1 1 

-V « 55.00364 and « 88 . 99775 . 

y/5 y/5 

We can use this observation to derive another closed form, 

cj) n 

= —= rounded to the nearest integer, (6.124) 

v 5 

because |$ n /v / 5 | <1 for all n 5 ; 0 . When n is even, F n is a little bit less 
than cf) n /v / 5 ; otherwise it is a little greater. 

Cassini’s identity (6.103) can be rewritten 

Fn+1 _ Jn_ = (-I)" 

F n F n _i F n _! F n 

When n is large, 1 /F n _-|F n is very small, so F n +i/F n must be very nearly 
the same as F n /F n _i ; and (6.124) tells us that this ratio approaches 4 >- In 
fact, we have 

F n+i — 


Fn = 


4> n 

7E 


- 


(6.125) 
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If the USA ever 
goes metric, our 
speed limit signs 
will go from 55 
mi/hr to 89 km/hr. 
Or maybe the high- 
way people will be 
generous and let us 
go 90. 


The “shift down" 
rule changes n 
to f(n/4>) and 
the “shift up” 
rule changes n 
to f(ncj)), where 
f(x) = [x + 4C 1 J . 


(This identity is true by inspection when n = 0 or n = 1 , and by induction 
when n > 1; we can also prove it directly by plugging in ( 6 . 123 ).) The ratio 
F n+ i/F n is very close to cf>, which it alternately overshoots and undershoots. 

By coincidence, <j) is also very nearly the number of kilometers in a mile. 
(The exact number is 1.609344, since 1 inch is exactly 2.54 centimeters.) 
This gives us a handy way to convert mentally between kilometers and miles, 
because a distance of F n+ i kilometers is (very nearly) a distance of F n miles. 

Suppose we want to convert a non-Fibonacci number from kilometers 
to miles; what is 30 km, American style? Easy: We just use the Fibonacci 
number system and mentally convert 30 to its Fibonacci representation 21 + 
8 + 1 by the greedy approach explained earlier. Now we can shift each number 
down one notch, getting 13 + 5 + 1. (The former ‘1’ was F 2 , since k r 0 in 
( 6 . 113 ); the new ‘1’ is F].) Shifting down divides by cf>, more or less. Hence 
19 miles is our estimate. (That’s pretty close; the correct answer is about 
18.64 miles.) Similarly, to go from miles to kilometers we can shift up a 
notch; 30 miles is approximately 34 + 13 + 2 = 49 kilometers. (That’s not 
quite as close; the correct number is about 48.28.) 

It turns out that this shift-down rule gives the correctly rounded number 
of miles per n kilometers for all n ^ 100, except in the cases n = 4, 12, 62, 75, 
91, and 96, when it is off by less than 2/3 mile. And the shift-up rule gives 
either the correctly rounded number of kilometers for n miles, or 1 km too 
many, for all n ^ 1 26. (The only really embarrassing case is n = 4, where the 
individual rounding errors for n = 3 + 1 both go the same direction instead 
of cancelling each other out.) 


6.7 CONTINUANTS 

Fibonacci numbers have important connections to the Stern-Brocot 
tree that we studied in Chapter 4, and they have important generalizations to 
a sequence of polynomials that Euler studied extensively. These polynomials 
are called continuants, because they are the key to the study of continued 
fractions like 


a 0 + 


1 


1 

<+ + 

0-2 4 

a 3 + 

&4 + 


1 

1 

i 

T 


a 6 + 


1 

Q7 


(6.126) 


d5 + 
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The continuant polynomial K n (xi , x 2 , ■ ■ ■ , x n ) has n parameters, and it 
is defined by the following recurrence: 

Ko() = 1; 

Ki (xi ) = xi ; 

K n (xi,...,x n ) = K n _ 1 (xi,...,x n _ 1 )x n + K n _ 2 (xi,...,x n _ 2 ). (6.127) 
For example, the next three cases after Ki (xi ) are 

K 2 (xi ,x 2 ) = xix 2 + 1 ; 

K 3 (x 1 ,x 2 ,x 3 ) = xtx 2 x 3 +xt +x 3 ; 

K4 (xi , X 2 , X 3 , X4 ) = XlX 2 X 3 X4 + X]X 2 + X1X4 + X 3 X4 + 1 . 

It’s easy to see, inductively, that the number of terms is a Fibonacci number: 

K n (1,1,...,1) = F n +i • ( 6 . 128 ) 

When the number of parameters is implied by the context, we can write 
simply ‘K’ instead of ‘K n ’, just as we can omit the number of parameters 
when we use the hypergeometric functions F of Chapter 5 . For example, 
K(xi ,x 2 ) = K 2 (xi ,x 2 ) = XiX 2 + 1 . The subscript n is of course necessary in 
formulas like (6.128). 

Euler observed that K(xi ,x 2 , . . . ,x n ) can be obtained by starting with 
the product xix 2 . . .x n and then striking out adjacent pairs XkXk+i in all 
possible ways. We can represent Euler’s rule graphically by constructing all 
“Morse code” sequences of dots and dashes having length n, where each dot 
contributes 1 to the length and each dash contributes 2 ; here are the Morse 
code sequences of length 4 : 


These dot-dash patterns correspond to the terms of K(xi , x 2 , x 3 , X4); a dot 
signifies a variable that’s included and a dash signifies a pair of variables 
that’s excluded. For example, corresponds to X1X4. 

A Morse code sequence of length n that has k dashes has n — 2 k dots and 
n — k symbols altogether. These dots and dashes can be arranged in ( n ^ k ) 
ways; therefore if we replace each dot by z and each dash by 1 we get 


K r 


z 

k=0 


n-k' 
k 


-2k 


(6.129) 
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We also know that the total number of terms in a continuant is a Fibonacci 
number; hence we have the identity 


F n+1 



(6.130) 


(A closed form for (6.129), generalizing the Euler-Binet formula (6.123) for 
Fibonacci numbers, appears in (5.74).) 

The relation between continuant polynomials and Morse code sequences 
shows that continuants have a mirror symmetry: 


K(x n , . . . , x 2 , Xj ) = K(x 1 ,x 2 ,...,x n ). 


(6.131) 


Therefore they obey a recurrence that adjusts parameters at the left, in ad- 
dition to the right-adjusting recurrence in definition (6.127): 

K n (x 1 ,...,x n ) = x 1 K n _i(x 2 ,...,x n ) + K n _ 2 (x 3 ,...,x n ). (6.132) 

Both of these recurrences are special cases of a more general law: 

Km+n(x.1 , • • • , Xm > X m +i , . . . , X m -|- n ) 

= Km (xi , • . . , X m ) K n (x m -|-i , . . . , Xhv-i-tx) 

T K m _i (xi , . . . , x m _i ) K n _i ( x rrt _|_ 2 , . . . , XjTL-j-ix) . (6.133) 


This law is easily understood from the Morse code analogy: The first product 
K m K n yields the terms of K m+n in which there is no dash in the [m, m + 1] 
position, while the second product yields the terms in which there is a dash 
there. If we set all the x’s equal to 1, this identity tells us that F m+n+ i = 
F m +iF n+ i + F m F n ; thus, (6.108) is a special case of (6.133). 

Euler [112] discovered that continuants obey an even more remarkable 
law, which generalizes Cassini’s identity: 


^m+n(^1 > • • • y Xm+n ) K) c (x rrl _|_i , . . . , X m _|_i < ) 

= K m _|_] c (xi , . . . ,X m -|_k) Kn(Xrri+l > • • • )Xiu+n) 

~F ( 1) K m _i (xi , . . . , x m _i ) K n _k-i (x TrL _|_i < _|_ 2 , . . . , x m + n ) . ( 6 . 134 ) 

This law (proved in exercise 29) holds whenever the subscripts on the K’s are 
all nonnegative. For example, when k = 2, m = 1 , and n = 3, we have 


K(xt ,x 2 ,x 3 ,x 4 ) K(x 2 ,x 3 ) = K(xt ,x 2 ,x 3 ) K(x 2 ,x 3 ,x 4 ) + 1 . 


Continuant polynomials are intimately connected with Euclid’s algo- 
rithm. Suppose, for example, that the computation of gcd(m, n) finishes 
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in four steps: 


gcd(m, n) 

= gcdfno.Ti!) 

n 0 

= m, n.] 

= n; 



= gcd(ni ,n 2 ) 

n 2 

= no mod ni 

= n 0 

- qmi ; 


= gcd(n 2 ,n 3 ) 

n 3 

= ni mod n 2 

= ni 

- q2H 2 ; 


= gcd(n 3 ,n 4 ) 

n 4 

= n 2 mod n 3 

= n 2 

- qsn 3 ; 


= gcd(n 4 , 0) = n 4 . 

0 

= n 3 mod n 4 

= n 3 

- q 4 n 4 . 

Then we have 






d 

II 

d 

= K()n 4 ; 






n 3 = q 4 n 4 = K(q 4 )n 4 ; 

n 2 = q 3 n 3 +n 4 = K(q 3 ,q 4 )n 4 ; 
ni = q 2 n 2 +n 3 = K(q 2 , q 3 , q 4 )n 4 ; 
n 0 = q i ri! + n 2 = K(q-, , q 2 , q 3 , q 4 )n 4 . 


In general, if Euclid’s algorithm finds the greatest common divisor d in k steps, 
after computing the sequence of quotients qj , . . . , q^, then the starting num- 
bers were K ( q i , q 2 , . . . , qiJd and K(q 2 , . . . , q^jd. (This fact was noticed early 
in the eighteenth century by Thomas Fantet de Lagny [232], who seems to 
have been the first person to consider continuants explicitly. Lagny pointed 
out that consecutive Fibonacci numbers, which occur as continuants when the 
q’s take their minimum values, are therefore the smallest inputs that cause 
Euclid’s algorithm to take a given number of steps.) 

Continuants are also intimately connected with continued fractions, from 
which they get their name. We have, for example, 


Qo + 


1 


CLl + 


1 


1 

q 2 H 

13 


K(do, CL 1 , d 2 , q 3 ) 
K(di , d 2 , d 3 ) 


(6-135) 


The same pattern holds for continued fractions of any depth. It is easily 
proved by induction; we have, for example, 


K(d 0 , di , d 2 , d 3 + 1/d 4 ) _ K(d 0 , di , d 2 , d 3 , d 4 ) 
K(d 1 ,d 2 ,d 3 + 1/ci 4 ) K(d 1 ,d 2 ,d 3 ,d 4 ) 


because of the identity 


(X-i , . . . , Xn— 1 > ^ri H” \) ) 

= K n (xi , . . . ,x n _i ,x n ) + K n _i (xi , . . . ,x n _i )ij . (6.136) 


(This identity is proved and generalized in exercise 30.) 
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Moreover, continuants are closely connected with the Stern-Brocot tree 
discussed in Chapter 4. Each node in that tree can be represented as a 
sequence of L’s and R’s, say 

R a 0L a, R a 2L a 3 R a n _ 2L a n _, > (6.137) 


where do ^ 0, di ^ 1 , d 2 ^ 1 , <13 ^ 1 , . . . , a n -2 ^ 1, Qn-i ^ 0, and n is 
even. Using the 2 x 2 matrices L and R of (4.33), it is not hard to prove by 
induction that the matrix equivalent of (6.137) is 

/ K n _2 ( Cl] , . . . , Clix— 2) Kn— 1 (^1j • • • > &n— 2> ^n- 1 ) 

V K n -i ( a 0. ai , . . . , a n _ 2 ) K n (ao, Qi , . . . , a n _ 2 , a n -i ) 

(The proof is part of exercise 87 .) For example, 

R a L b R c L d — ( + 1 bed + b + d 

^abc + a + c abed + ab + ad + cd + 1 



(6.138) 


Finally, therefore, we can use (4.34) to write a closed form for the fraction in 
the Stern-Brocot tree whose L-and-R representation is (6.137): 


f(R Q ° ...L a — ’) 


K Tt+ i(ao,a 1 ,...,a n -i,1) 
Knf CLl > * • • ) &n— 1 > ^ ) 


( 6 - 139 ) 


(This is “Halphen’s theorem” [174].) For example, to find the fraction for 
LRRL we have do = 0, ai = 1 , d 2 = 2, a.3 = 1 , and n = 4; equation (6.139) 
gives 

K(0, 1,2, 1, 1) = K(2, 1, 1) = K(2,2) = 5 

K(1 ,2, 1,1) K(1,2, 1,1) K(3,2) 7‘ 


(We have used the rule K n (xi , . . . , x n _i ,x n + 1 ) = K n+ i (xj , . . . ,x n _i , x n , 1 ) 
to absorb leading and trailing 1 ’s in the parameter lists; this rule is obtained 
by setting y = 1 in (6.136).) 

A comparison of (6.135) and (6.139) shows that the fraction correspond- 
ing to a general node (6.137) in the Stern-Brocot tree has the continued 
fraction representation 


f(R Q ° ...L a — ’) 


a 0 + 


d! + 

d 2 + 


1 

i 

i 

1 


(6.140) 


. . . + 


i 

fl-n-l + y 
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Thus we can convert at sight between continued fractions and the correspond- 
ing nodes in the Stern-Brocot tree. For example, 


f(LRRL) = 0 + 


1 + 


1 

T 


We observed in Chapter 4 that irrational numbers define infinite paths in 
the Stern-Brocot tree, and that they can be represented as an infinite string 
of L’s and R’s. If the infinite string for a is R a °L a, R a2 L a3 ..., there is a 
corresponding infinite continued fraction 


a = ao + 


ai + 


a 2 


Q3 + 


0-4 + 


Q5 + 


(6.141) 


This infinite continued fraction can also be obtained directly: Let ao = <x and 
for k ^ 0 let 

die = L a kJ ; = dkH 1 — • (6.142) 

ttk+1 


The a’s are called the “partial quotients” of a. If a is rational, say m/n, 
this process runs through the quotients found by Euclid’s algorithm and then 
stops (with aic+i = 00 ). 

Is Euler’s constant y rational or irrational? Nobody knows. We can get 
partial information about this famous unsolved problem by looking for y in 
the Stern-Brocot tree; if it’s rational we will find it, and if it’s irrational we 
will find all the closest rational approximations to it. The continued fraction 
for y begins with the following partial quotients: 


k 

0 

1 

2 

3 

4 

5 

6 

7 

8 

dk 

0 

1 

1 

2 

1 

2 

1 

4 

3 


Therefore its Stern-Brocot representation begins LRLLRLLRLLLLRRRL . . . ; no 
pattern is evident. Calculations by Richard Brent [38] have shown that, if y 
is rational, its denominator must be more than 10,000 decimal digits long. 


Or if they do, 
they’re not talking. 
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Well, y must be 
irrational, because 
of a little-known 
Einsteinian asser- 
tion: “God does 
not throw huge 
denominators at 
the universe.” 


Therefore nobody believes that y is rational; but nobody so far has been able 
to prove that it isn’t. 

Let’s conclude this chapter by proving a remarkable identity that ties a lot 
of these ideas together. We introduced the notion of spectrum in Chapter 3; 
the spectrum of a is the multiset of numbers [noc] , where a is a given constant. 
The infinite series 


Y z Ln<W = z + z 1 * 3 + z 4 * + z 6 * + z 8 + z 9 -| 


can therefore be said to be the generating function for the spectrum of 4 >, 
where cj) = (1 + \/5)/2 is the golden ratio. The identity we will prove, dis- 
covered in 1976 by J.L. Davison [73], is an infinite continued fraction that 
relates this generating function to the Fibonacci sequence: 




1 + 


Z ? 2 


= (l-z)£ 




n]> 1 


Z F S 


Z F 4 


(6-143) 


Both sides of ( 6 . 143 ) are interesting; let’s look first at the numbers [n4>J. 
If the Fibonacci representation ( 6 . 113 ) of n is F kl + ■ • • + F kr , we expect ru|) 
to be approximately F kl + i + • • • + F kr+ i , the number we get from shifting the 
Fibonacci representation left (as when converting from miles to kilometers). 
In fact, we know from ( 6 . 125 ) that 

ruf) = F k|+ i + • • • + F kr+ i — ($ k| + • • • + $ kr ) • 

Now $ = — 1 /<Jj and ki k r >> 0, so we have 


$ k| + ■ • • + $ kr | < 4) kr + cf) 


kr 


k r 2 




-k r — 4 


1 

1 - Cf) 2 


r ^ 41 1 < 1; 


and $ kl + • • ■ + $ kT has the same sign as (— l) kT , by a similar argument. 

Hence 


[ncf)] = F kl+ i H FF kT+1 - [k r (n) is even] . (6.144) 

Let us say that a number n is Fibonacci odd (or F-odd for short) if its least 

significant Fibonacci bit is 1; this is the same as saying that k r (n) = 2. 

Otherwise n is Fibonacci even (F-even). For example, the smallest F-odd 
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numbers are 1 , 4 , 6, 9 , 12 , 14 , 17 , and 19 . If k r (n) is even, then n — 1 is 
F-even, by (6.114); similarly, if k T (n) is odd, then n— 1 is F-odd. Therefore 

k T (n) is even <(=4 n — 1 is F-even. 

Furthermore, if k r (n) is even, (6.144) implies that k r (Ln. 4 >J) = 2 ; if k r (n) is 
odd, (6.144) says that k r (Ln. 4 >J) = k r (n) + 1 . Therefore k r ( Ln 4 >J ) is always 
even, and we have proved that 

[n 4 )J — 1 is always F-even. 

Conversely, if m is any F-even number, we can reverse this computation and 
find an n such that m + 1 = ( n 4 ) J ■ (First add 1 in F-notation as explained 
earlier. If no carries occur, n is (m + 2 ) shifted right; otherwise n is (m + 1 ) 
shifted right.) The right-hand sum of (6.143) can therefore be written 

zL ni W = z z m [m is F-even] . (6.145) 

n(> 1 m(>0 


How about the fraction on the left? Let’s rewrite (6.143) so that the 
continued fraction looks like (6.141), with all numerators 1: 


1 


z 


Fo 


1 


Z 


F, 


1 




Y_ z Ln4>J . 


(6.146) 


(This transformation is a bit tricky! The numerator and denominator of the 
original fraction having z Fti as numerator should be divided by z 1 ' 1 1 .) If 
we stop this new continued fraction at 1/z~ ?Tl , its value will be a ratio of 
continuants, 


Kn+2(0,z- F °,z- F ’,...,z- F ~) = K n (z- F y..,z- F ~) 

Kn+1 (z~ Fo , Z Fl , - - . , z- F - ) K n+1 (z- F ° , z- ? i , . . . , z- F - ) ’ 

as in (6.135). Let’s look at the denominator first, in hopes that it will be 
tractable. Setting Q n = K n+ i (z~ F ° , . . . , z~ Fti ), we find Qo = 1 , Qi =1 +z _1 , 
Q2 = 1 +z _1 +z~ 2 , Q3 = 1 +z _1 +z~ 2 +z _J +z~ 4 , and in general everything 
fits beautifully and gives a geometric series 

Q n = 1 +z -l +z - 2 + ... +z -(F n + 2 - 1 )_ 
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The corresponding numerator is P n = K n (z ?1 , . . . ,z Fn ); this turns out to 
be like Q n but with fewer terms. For example, we have 

P 5 = z ^ + z~ 2 + z~ 4 + z~ 5 + z~ 7 + z~ 9 + z~ 10 + z~ 12 , 


compared with Q5 = 1 + z 1 + • • ■ + z 12 . A closer look reveals the pattern 
governing which terms are present: We have 


Ps 


1+z 2 +z 3 +z 5 +z 7 +z 8 +z 10 +z n 


12 

z~ 12 ^ z m [ra is F-even] ; 

m=0 


and in general we can prove by induction that 

U+2-i 

P n = z 1 ~ F "+ 2 ^ z m [m is F-even] . 

m=0 


Therefore 


21^=0 1 z m [m is F-even] 


71 + 2 7 m 

Z_m=0 z 


Taking the limit as n — > oo now gives (6.146), because of (6.145). 


Qn 


Exercises 

Warmups 

1 What are the [ 4 ] =11 permutations of {1,2, 3, 4} that have exactly two 
cycles? (The cyclic forms appear in (6.4); non-cyclic forms like 2314 are 
desired instead.) 

2 There are m. n functions from a set of n elements into a set of m elements. 
How many of them range over exactly k different function values? 

3 Card stackers in the real world know that it’s wise to allow a bit of slack 
so that the cards will not topple over when a breath of wind comes along. 
Suppose the center of gravity of the top k cards is required to be at least 
e units from the edge of the k + 1st card. (Thus, for example, the first 
card can overhang the second by at most 1 — e units.) Can we still achieve 
arbitrarily large overhang, if we have enough cards? 

4 Express 1/1 + 1 /3 + • • ■ + 1 / (2n+l ) in terms of harmonic numbers. 

Explain how to get the recurrence (6.75) from the definition of U n (x,y) 
in (6.74), and solve the recurrence. 


5 
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6 


An explorer has left a pair of baby rabbits on an island. If baby rabbits 
become adults after one month, and if each pair of adult rabbits produces 
one pair of baby rabbits every month, how many pairs of rabbits are 
present after n months? (After two months there are two pairs, one of 
which is newborn.) Find a connection between this problem and the “bee 
tree” in the text. 


If the harmonic 
numbers are worm 
numbers, the Fi- 
bonacci numbers 
are rabbit numbers. 


7 Show that Cassini’s identity (6.103) is a special case of (6.108), and a 
special case of (6.134). 

8 Use the Fibonacci number system to convert 65 mi/hr into an approxi- 
mate number of km/hr. 

9 About how many square kilometers are in 8 square miles? 

10 What is the continued fraction representation of 4 >? 


Basics 

11 What is )T k (— 1 ) k [ k ] , the row sum of Stirling’s cycle-number triangle 
with alternating signs, when n is a nonnegative integer? 

12 Prove that Stirling numbers have an inversion law analogous to (5.48): 


9 ( n ) = I " Hl^k) «=* f(n) = ^ 


M) k g(k). 


13 The differential operators D = -p and 5 = zD are mentioned in Chapters 


2 and 5 . We have 


Q 2 = z 2 D 2 +zD, 

because $ 2 f(z) = dzf'(z) = z^zf'(z) = z 2 f"(z) + zf'(z), which is 
(z 2 D 2 +zD)f (z). Similarly it can be shown that f ) 3 = z 3 D 3 + 3 z 2 D 2 + zD. 
Prove the general formulas 



for all n 5 ; 0 . (These can be used to convert between differential expres- 
sions of the forms a,kZ k f^ k '(z) and | 3 k $ k f(z), as in (5.109).) 

14 Prove the power identity (6.37) for Eulerian numbers. 

15 Prove the Eulerian identity (6.39) by taking the mth difference of (6.37). 
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16 What is the general solution of the double recurrence 

A n ,o = a n [n ^ 0] ; A 0 ,ic =0, if k > 0; 
A n ,k = hA n _ lik + A n _i, k _! , integers k,n, 


17 


18 


19 


when k and n range over the set of all integers? 

Solve the following recurrences, assuming that | k | is 
k< 0: 


n- 1 
k 


+ n 


= (n-k) 


n- 1 
k- 1 

n — 1 
k 


= k 


n- 1 
k 


+ k. 


n — 1 
k- 1 
TV-1 
k- 1 


+ [n = k = 0] , 

+ [n = k = 0] , 


+ [ n = k = 0] 


Prove that the Stirling polynomials satisfy 

(x+ 1) ff n (x+ 1) = (x-n)cr n (x)+xcT n _i(x). 


Prove that the generalized Stirling numbers satisfy 


zero when n < 0 or 

for n, k ^ 0. 
for n, k ^ 0. 
for n, k ^ 0. 



0 , integer n > 0. 
0 , integer n > 0. 


20 Find a closed form for Y. k = l H^ 2 '. 

21 Show that if H n = a n /b n , where a n and b n are integers, the denomina- 
tor b n is a multiple of 2L lg n J . Hint: Consider the number 2 L's — 1 H n — 

1 

2 - 

22 Prove that the infinite sum 



1 

k + z 


converges for all complex numbers z, except when z is a negative integer; 
and show that it equals H z when z is a nonnegative integer. (Therefore we 
can use this formula to define harmonic numbers H z when z is complex.) 

23 Equation (6.8i) gives the coefficients of z/(e z — 1), when expanded in 
powers of z. What are the coefficients of z/(e z + 1 )? Hint: Consider the 
identity ( e z + 1 ) ( e z — 1 ) = e 2z — 1 . 
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24 Prove that the tangent number T 2n +i is a multiple of 2 n . Hint: Prove 
that all coefficients of Tin (x) and T 2n+ , (x) are multiples of 2 n . 

25 Equation ( 6 . 57 ) proves that the worm will eventually reach the end of 
the rubber band at some time N. Therefore there must come a first 
time n when he’s closer to the end after n minutes than he was after 
n — 1 minutes. Show that n < jlM. 

26 Use summation by parts to evaluate S n = 2Ik = i H^/k. Hint: Consider 

also the related sum Hk-i /k. 

27 Prove the gcd law ( 6 . 111 ) for Fibonacci numbers. 

28 The Lucas number L n is defined to be F n+ i + F n -i . Thus, according to 
( 6 . 109 ), we have F 2n — F n L n . Here is a table of the first few values: 


n 

0 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Ln 

2 1 

3 

4 

7 

11 

18 

29 

47 

76 

123 

199 

322 

521 


a Use the repertoire method to show that the solution Q n to the gen- 
eral recurrence 


Qo — & j Ql — |3 j Qn — Qn — 1 + Qn— 2 > Tl > 1 

can be expressed in terms of F n and L n . 
b Find a closed form for L n in terms of 4 > and $. 

29 Prove Euler’s identity for continuants, equation (6.134). 

30 Generalize (6.136) to find an expression for the incremented continuant 
K(xi , . . . ,x m — 1 , x m + y , x m+ i , . . . , x n ), when 1 ^ m ^ n. 


Homework exercises 

31 Find a closed form for the coefficients |£| in the representation of rising 
powers by falling powers: 


x 


n 


z 



integer n ^ 0 . 


(For example, x 4 =x-+ 12x- + 3 6x- + 24xT, hence | 4 | = 36.). 
32 In Chapter 5 we obtained the formulas 



by unfolding the recurrence (£) = ( ri k 1 ) + (£_]) in two ways. What 
identities appear when the analogous recurrence {£} = kj 11 ^ 1 } + {]) , } 
is unwound? 
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Ah! Those were 
prime years. 


33 Table 264 gives the values of and {^j. What are closed forms (not 
involving Stirling numbers) for the next cases, and {3}? 

34 What are ( k ' ) and (^ 2 ), if the basic recursion relation (6.35) is assumed 
to hold for all integers k and n, and if = 0 for all k < 0? 

35 Prove that, for every e > 0, there exists an integer n > 1 (depending 
on e) such that H n mod 1 < e. 

36 Is it possible to stack n bricks in such a way that the topmost brick is not 
above any point of the bottommost brick, yet a person who weighs the 
same as 1 00 bricks can balance on the middle of the top brick without 
toppling the pile? 

37 Express H!<=i (k mod m)/k(k + 1) in terms of harmonic numbers, as- 
suming that m and u are positive integers. What is the limiting value 
as n — * 00? 

38 Find the indefinite sum (£) (— 1 ) k Hk 6k. 

39 Express )Tk_i H k in terms of n and H n . 

40 Prove that 1979 divides the numerator of 1 ) k_1 /k, and give a 

similar result for 1987. Hint: Use Gauss’s trick to obtain a sum of 
fractions whose numerators are 1979. See also exercise 4. 

41 Evaluate the sum 

y (Y(n + k)/2j 

k k 

in closed form, when n is an integer (possibly negative). 

42 If S is a set of integers, let S + 1 be the “shifted” set {x + 1 x € S}. 
How many subsets of {1,2,..., n} have the property that S U (S + 1 ) = 
{1 , 2 , . . . , n + 1}? 

43 Prove that the infinite sum 

.1 

+ .01 

+ .002 
+ .0003 
+ .00005 
+ .000008 
+ .0000013 


converges to a rational number. 
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44 Prove the converse of Cassini’s identity (6.106): If k and m are integers 
such that |m 2 — km— k 2 | = 1 , then there is an integer n such that k = ±F n 
and m = ±F n+ i . 

45 Use the repertoire method to solve the general recurrence 

Xq = a; Xi = (3 ; X n = X n _i + X n _2 + yn + 6 . 

46 What are cos 36° and cos72°? 

47 Show that 



and use this identity to deduce the values of F p mod p and F p+ i mod p 
when p is prime. 

48 Prove that zero- valued parameters can be removed from continuant poly- 
nomials by collapsing their neighbors together: 

Kn (Xi t . . . , X m _i , 0,X m+ l , . . . , X n ) 

= K n _ 2 (xi,..., ^m— 2 > 1 +X m+ 1 ,x m+2 , . . . ,x n ) , 1 < m < n. 

49 Find the continued fraction representation of the number ]T n>1 2~L nt H . 

50 Define f (n) for all positive integers n by the recurrence 

f(l) = 1; 

f(2n) = f(n) ; 

f(2n + 1 ) = f(n) + f(n + 1) . 
a For which n is f(n) even? 

b Show that f (n) can be expressed in terms of continuants. 

Exam problems 

51 Let p be a prime number, 

a Prove that {£} = [ p ] = 0 (mod p), for 1 < k < p. 

b Prove that = 1 (mod p), for 1 5^ k < p. 

c Prove that { 2p ~ 2 } = [ 2p p ~ 2 ] = 0 (mod p), if p > 2. 

d Prove that if p > 3 we have [ p ] = 0 (mod p 2 ). Hint: Consider p£. 

Let H n be written in lowest terms as a n /b n . 
a Prove that p\b n pXa[ n / p j, if p is prime, 

b Find all n > 0 such that a n is divisible by 5. 


52 
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53 Find a closed form for ^Ik!=o (k) 1 ( — 1) k Flk, when 0 ^ m ^ n. Hint: 
Exercise 5.42 has the sum without the Hp factor. 

54 Let n > 0. The purpose of this exercise is to show that the denominator 
of f> 2 n is the product of all primes p such that (p— 1 )\( 2 n). 

a Show that S m (p) + [(p— 1)\m] is a multiple of p, when p is prime 
and m > 0 . 

b Use the result of part (a) to show that 

[( P — 1 )\( 2 n )] T . 

= l 2 n ls an integer. 

P 

p prime 

Hint: It suffices to prove that, if p is any prime, the denominator of 
the fraction B 2 n + [(p — 1 )\( 2 n)]/p is not divisible by p. 
c Prove that the denominator of E> 2 n is always an odd multiple of 6 , 
and it is equal to 6 for infinitely many n. 

55 Prove ( 6 . 70 ) as a corollary of a more general identity, by summing 


£>2n + 



and differentiating with respect to x. 

56 Evaluate X.k^m (^) ( — 1 ) 1c lc n+1 /(lc — m) in closed form as a function of 
the integers m and n. (The sum is over all integers k except for the value 
k = m.) 

57 The “wraparound binomial coefficients of order 5” are defined by 



n — 1 

(k — 1 ) mod 5 


n > 0 , 


and ((£)) = [k = 0] . Let Q n be the difference between the largest and 
smallest of these numbers in row n: 


Qn = max (7™T) - min (T^T) • 

0$k<5\\k// 0^k<5\\k )) 

Find and prove a relation between Q n and the Fibonacci numbers. 

58 Find closed forms for n>0 F;(z n an d /Ln>o F^z n . What do you deduce 
about the quantity F^ + 1 — 4F(( — F^ 1 ? 

59 Prove that if m and n are positive integers, there exists an integer x such 
that F x = m (mod 3 n ). 

60 Find all positive integers n such that either F n + 1 or F n — 1 is a prime 
number. 
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61 Prove the identity 

> - — =3 , integer n gs I . 

z — Fok F ? n 

k=0 z z 


What is )T k=0 1 /F 3.2 k? 

62 Let A n = 4> n + c()- n and B n = c|) n - 4> rL . 

a Find constants a and (3 such that A n = ccA n _i + (3A n _2 and = 
aB n _j + |3B n _2 for all n ^ 0. 

b Express A n and B n in terms of F n and L n (see exercise 28). 
c Prove that 2Zk=i V(F2k+i + 1) = B a /A n+ i. 
d Find a closed form for ^J) =1 1 /(F21C+1 — 1 ). 

Bonus problems 

63 How many permutations 7t] 7t2 . . . 7t n of {1 , 2, . . . , n} have exactly k in- 
dices ) such that 

a 7ti < 7tj for all i < j? (Such j are called “left-to-right maxima.”) 
b Ttj > j? (Such j are called “excedances.” ) 

64 What is the denominator of [, // 2 n \, when this fraction is reduced to 
lowest terms? 


65 Prove the identity 
f 1 f 1 

f(L x i + ' ■ ■ +* n J) dxi . .. dx n 

Jo Jo 




66 What is )T k (— 1 ) k ( k ), the nth alternating row sum of Euler’s triangle? 

67 Prove that 



68 Show that (( 1 ] 1 )) =2(^), and find a closed form for ((2))- 

69 Find a closed form for X.k=i h 2 H n+k . 

70 Show that the complex harmonic numbers of exercise 22 have the power 
series expansion H z = 2) n >2(“^ ) n F4^ *z n_1 . 

71 Prove that the generalized factorial of equation (5.83) can be written 


no+ 



e 


yz 


Z! 


by considering the limit as n — ) 00 of the first n factors of this infinite 
product. Show that ^(z!) is related to the general harmonic numbers 
of exercise 22. 


Bogus problems 
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72 Prove that the tangent function has the power series (6.92), and find the 
corresponding series for z/sinz and ln((tanz)/z). 

73 Prove that zcotz is equal to 


z 

2 "- 


cot 


z 

2™ 


z z 

— tan — - 

2n 2 n 


+ L 

k=l 


cot 


k7T 


■ cot 


Z — k7t 


for all integers n 1, and show that the limit of the kth summand is 
2z 2 /(z 2 — k 2 7t 2 ) for fixed k as n — > 00. 

74 Find a relation between the numbers T n ( 1 ) and the coefficients of 1 /cos z. 

75 Prove that the tangent numbers and the coefficients of 1/cosz appear at 
the edges of the infinite triangle that begins as follows: 

1 

0 1 
1 1 0 

0 12 2 

5 5 4 2 0 

0 5 10 14 16 16 

61 61 56 46 32 16 0 


Each row contains partial sums of the previous row, going alternately left- 
to-right and right-to-left. Hint: Consider the coefficients of the power 
series ( sin z + cos z) / cos ( w + z) . 

76 Find a closed form for the sum 


IH 



2 n ~ k k! , 


and show that it is zero when n. is even. 


77 When m and n are integers, 0, the value of cr n (m) is given by (6.48) 
if m < 0, by (6.49) if m n, and by (6.101) if m = 0. Show that in the 
remaining cases we have 


ff n (m) 


j _1 jm+n-1 

m! (n — m)! 


m— 1 


L 


m 

m — k 


E>n-k 
n- k ’ 


integer n > m > 0. 


78 Prove the following relation that connects Stirling numbers, Bernoulli 
numbers, and Catalan numbers: 



M) k 

k+l 



1 

TV + 1 ' 


79 Show that the four chessboard pieces of the 64 = 65 paradox can also be 
reassembled to prove that 64 = 63. 
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80 A sequence defined by the recurrence Ai = x, A 2 = y, and A n = A n _i + 
A n _2 has A m = 1000000 for some m. What positive integers x and y 
make m as large as possible? 

81 The text describes a way to change a formula involving F n ±ic to a formula 
that involves F n and F n+ i only. Therefore it’s natural to wonder if two 
such “reduced” formulas can be equal when they aren’t identical in form. 
Let P(x,y) be a polynomial in x and y with integer coefficients. Find a 
necessary and sufficient condition that P(F n+ i , Fn) = 0 for all n 0 . 

82 Explain how to add positive integers, working entirely in the Fibonacci 
number system. 

83 Is it possible that a sequence (A n ) satisfying the Fibonacci recurrence 
A n = A n _i + A n _2 can contain no prime numbers, if Ac and Ai are 
relatively prime? 

84 Let m. and n be odd, positive integers. Find closed forms for 


s+ = y 


k>0 


F 2 


mk+n 


+ Fr 


Sm,n — y~ 


k>0 


r 2mk+n 


-f t 


Hint: The sums in exercise 62 are 3 — S^ 2 n +3 an d S i 3 — S n 2 n+ 3 - 

85 Characterize all N such that the Fibonacci residues F n mod N for n 0 
form the complete set {0, 1 , . . . , N — 1}. (See exercise 59.) 

86 Let Ci , C 2 , ■ ■ ■ be a sequence of nonzero integers such that 


gcd(C m , C n ) 

for all positive integers m and n. Prove that the generalized binomial 
coefficients 

f CrtCrL — 1 . . . C n — k+1 

\k/ g CkCk-1 ... Cl 

are all integers. (In particular, the “Fibonomial coefficients” formed in 
this way from Fibonacci numbers are integers, by ( 6 . 111 ).) 

Show that continuant polynomials appear in the matrix product 


0 1 

1 X] 


0 1 

1 x 2 


0 1 


87 
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and in the determinant 




/*! 

1 

0 

0 

... ox 


-1 

*2 

1 

0 

0 


0 

-1 

*3 

1 


det 



-1 


1 


V 0 

0 



-1 X n / 


88 Generalizing ( 6 . 146 ), find a continued fraction related to the generating 
function Y _ n > 1 zL noc J, when a is any positive irrational number. 

89 Let a be an irrational number in (0..1) and let ai , < 12 , < 13 , ... be 
the partial quotients in its continued fraction representation. Show that 
|D(<x,n)| < 2 when n = K(ai , . . . , a m ), where D is the discrepancy 
defined in Chapter 3. 

90 Let Qn be the largest denominator on level n of the Stern-Brocot tree. 
(Thus (Qo, Qi , Q 2 , Q3, Q4, . . . ) = ( 1 , 2 , 3,5, 8 , ... ) according to the dia- 
gram in Chapter 4.) Prove that Q n = F IV+ 2 . 

Research problems 

91 What is the best way to extend the definition of {£} to arbitrary real 
values of n and k? 

92 Let H n be written in lowest terms as a n /b n , as in exercise 52. 

a Are there infinitely many n with ll\an.? 

b Are there infinitely many n with b n = lcm ( 1 , 2, . . . , n)? (Two such 

values are n = 250 and n = 1 000.) 

93 Prove that y and e Y are irrational. 

94 Develop a general theory of the solutions to the two-parameter recurrence 


n 

k 


( an + (3k + y ) 


TV-1 

k 


+ (oc'n + |3'k + y') 


TV-1 
k- 1 


[n = k = 0], forn,k^0, 


assuming that |£| =0 when n < 0 or k < 0. (Binomial coefficients, 
Stirling numbers, Eulerian numbers, and the sequences of exercises 17 
and 31 are special cases.) What special values (a, (3 , y , a', (3',y') yield 
“fundamental solutions” in terms of which the general solution can be 
expressed? 

95 Find an efficient way to extend the Gosper- Zeilberger algorithm from 
hypergeometric terms to terms that may involve Stirling numbers. 
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Generating Functions 


THE MOST POWERFUL WAY to deal with sequences of numbers, as far 
as anybody knows, is to manipulate infinite series that “generate” those se- 
quences. We’ve learned a lot of sequences and we’ve seen a few generating 
functions; now we’re ready to explore generating functions in depth, and to 
see how remarkably useful they are. 


7.1 DOMINO THEORY AND CHANGE 


“Let me count the 
ways. ” 

— E. B. Browning 


Generating functions are important enough, and for many of us new 
enough, to justify a relaxed approach as we begin to look at them more closely. 
So let’s start this chapter with some fun and games as we try to develop our 
intuitions about generating functions. We will study two applications of the 
ideas, one involving dominoes and the other involving coins. 

How many ways T n are there to completely cover a 2 x n rectangle with 
2 x 1 dominoes? We assume that the dominoes are identical (either because 
they’re face down, or because someone has rendered them indistinguishable, 
say by painting them all red); thus only their orientations — vertical or hori- 
zontal — matter, and we can imagine that we’re working with domino-shaped 
tiles. For example, there are three tilings of a 2 x 3 rectangle, namely HU, m , 
and ED; so T 3 = 3. 

To find a closed form for general T n we do our usual first thing, look at 
small cases. When n = 1 there’s obviously just one tiling, □; and when n = 2 
there are two, □ and B. 

How about when n = 0; how many tilings of a 2 x 0 rectangle are there? 
It’s not immediately clear what this question means, but we’ve seen similar 
situations before: There is one permutation of zero objects (namely the empty 
permutation), so 0! = 1 . There is one way to choose zero things from n things 
(namely to choose nothing), so (q) = 1. There is one way to partition the 
empty set into zero nonempty subsets, but there are no such ways to partition 
a nonempty set; so {£} = [n = 0]. By such reasoning we can conclude that 
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there’s just one way to tile a 2 x 0 rectangle with dominoes, namely to use 
no dominoes; therefore To = 1 . (This spoils the simple pattern T n = n that 
holds when n = 1 , 2 , and 3 ; but that pattern was probably doomed anyway, 
since To wants to be 1 according to the logic of the situation.) A proper 
understanding of the null case turns out to be useful whenever we want to 
solve an enumeration problem. 

Let’s look at one more small case, n = 4 . There are two possibilities for 
tiling the left edge of the rectangle — we put either a vertical domino or two 
horizontal dominoes there. If we choose a vertical one, the partial solution is 
[O and the remaining 2 x 3 rectangle can be covered in T3 ways. If we choose 
two horizontals, the partial solution HU can be completed in T2 ways. Thus 
T4 = T3 + T2 = 5 . (The five tilings are nm, HUB, [ED, ECD, and EB.) 

We now know the first five values of T n : 


n 

0 

1 

2 

3 

4 

Tn 

1 

1 

2 

3 

5 


These look suspiciously like the Fibonacci numbers, and it’s not hard to see 
why: The reasoning we used to establish T4 = T3 + T2 easily generalizes to 
T n = T n _i + T n _2, for n ^ 2 . Thus we have the same recurrence here as for 
the Fibonacci numbers, except that the initial values To = 1 and Tj = 1 are a 
little different. But these initial values are the consecutive Fibonacci numbers 
F] and F2, so the T’s are just Fibonacci numbers shifted up one place: 

T n = Fn+i , for n ^ 0. 

(We consider this to be a closed form for T n , because the Fibonacci numbers 
are important enough to be considered “known.” Also, F n itself has a closed 
form (6.123) in terms of algebraic operations.) Notice that this equation 
confirms the wisdom of setting To = 1 . 

But what does all this have to do with generating functions? Well, we’re 
about to get to that — there’s another way to figure out what T n is. This new 
way is based on a bold idea. Let’s consider the “sum” of all possible 2 x n 
tilings, for all n 0 , and call it T: 

T = I + D + B + B + DT + CB + B 1 + • • • . (7.1) 

(The first term ‘I’ on the right stands for the null tiling of a 2 x 0 rectangle.) 
This sum T represents lots of information. It’s useful because it lets us prove 
things about T as a whole rather than forcing us to prove them (by induction) 
about its individual terms. 

The terms of this sum stand for tilings, which are combinatorial objects. 
We won’t be fussy about what’s considered legal when infinitely many tilings 


To boldly go 
where no tiling has 
gone before. 
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are added together; everything can be made rigorous, but our goal right now 
is to expand our consciousness beyond conventional algebraic formulas. 

We’ve added the patterns together, and we can also multiply them — by 
juxtaposition. For example, we can multiply the tilings □ and B to get the 
new tiling E. But notice that multiplication is not commutative; that is, the 
order of multiplication counts: E is different from ED. 

Using this notion of multiplication it’s not hard to see that the null 
tiling plays a special role — it is the multiplicative identity. For instance, 
|x B = Bx l = B. 

Now we can use domino arithmetic to manipulate the infinite sum T: 

T = l + D + m + B + E + E + B]+--- 

= l + D(l + D + m + B+---)+B(l + D + m + B+---) 

= I + DT + BT . (7.2) 

Every valid tiling occurs exactly once in each right side, so what we’ve done is 
reasonable even though we’re ignoring the cautions in Chapter 2 about “ab- 
solute convergence.” The bottom line of this equation tells us that everything 
in T is either the null tiling, or is a vertical tile followed by something else 
in T, or is two horizontal tiles followed by something else in T. 

So now let’s try to solve the equation for T. Replacing the T on the left 
by IT and subtracting the last two terms on the right from both sides of the 
equation, we get 

(I — □ — H )T = I . (7.3) 

For a consistency check, here’s an expanded version: 

l+D + m + B + [B + E + H]+--- 

— '■'■0 — □ — E — E — mn — UB — EB — • • • 

— B — E=D — EE — EB — BUI - BB - EED - • • • 

I 

Every term in the top row, except the first, is cancelled by a term in either 
the second or third row, so our equation is correct. 

So far it’s been fairly easy to make combinatorial sense of the equations 
we’ve been working with. Now, however, to get a compact expression for T 
we cross a combinatorial divide. With a leap of algebraic faith we divide both 
sides of equation (7.3) by I — □ — B to get 

T = ! . 

I — □ — B 


1 have a gut feel- 
ing that these 
sums must con- 
verge, as long as 
the dominoes are 
small enough. 


(7-4) 
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(Multiplication isn’t commutative, so we’re on the verge of cheating, by not 
distinguishing between left and right division. In our application it doesn’t 
matter, because I commutes with everything. But let’s not be picky, unless 
our wild ideas lead to paradoxes.) 

The next step is to expand this fraction as a power series, using the rule 

1 23 

= 1 + z + z 2 + z 3 H . 

1 — z 

The null tiling I, which is the multiplicative identity for our combinatorial 
arithmetic, plays the part of 1 , the usual multiplicative identity; and □ + □ 
plays z. So we get the expansion 

= + (D + B) + (D + B ) 2 + (D + B ) 3 + --. 

= i + (d + b) + (m + fB + ai + ffl) 

+ (DDD + DDB + HD + Dm + BI + BB + fflD + ^B) + • • • . 

This is T, but the tilings are arranged in a different order than we had before. 
Every tiling appears exactly once in this sum; for example, HH I H I appears 
in the expansion of ( □ + B ) 7 . 

We can get useful information from this infinite sum by compressing it 
down, ignoring details that are not of interest. For example, we can imagine 
that the patterns become unglued and that the individual dominoes commute 
with each other; then a term like i-h I H I becomes D 4 ^ 6 , because it contains 
four verticals and six horizontals. Collecting like terms gives us the series 

T = 1+ □ + D 2 + Q 2 + D 3 +2Da 2 + D 4 +3D 2 a 2 + a 4 + • •• . 

The 2D a 2 here represents the two terms of the old expansion, E and ED, that 
have one vertical and two horizontal dominoes; similarly 3D 2 a 2 represents the 
three terms DDB, [ED, and EDI We’re essentially treating □ and □ as ordinary 
(commutative) variables. 

We can find a closed form for the coefficients in the commutative version 
of T by using the binomial theorem: 

— — 7 T = 1+ (D + C3 2 ) + (D + ud") 2 + (D + a 2 ) 3 + • • • 

I — { U ~r □ J 

= £> + ° 2 ) k 
k^O 

= Y ( k V : J : 2k 2 i 

= y_ ( j+m W™. (7.5) 

j.m^O ' - 1 2 
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Now I'm dis- 
oriented. 


(The last step replaces k — j by m; this is legal because we have ( k ) = 0 when 
0 ^ k < j.) We conclude that ( ,+ j m ) is the number of ways to tile a 2 x (j +2m) 
rectangle with ) vertical dominoes and 2m horizontal dominoes. For example, 
we recently looked at the 2x10 tiling I hh i h i , which involves four verticals 
and six horizontals; there are ( 4 + 3 ) = 35 such tilings in all, so one of the 
terms in the commutative version of T is 35D 4 o 6 . 

We can suppress even more detail by ignoring the orientation of the 
dominoes. Suppose we don’t care about the horizontal/ vertical breakdown; 
we only want to know about the total number of 2 x n tilings. (This, in 
fact, is the number T n we started out trying to discover.) We can collect 
the necessary information by simply substituting a single quantity, z, for □ 
and □ . And we might as well also replace I by 1 , getting 


1 — z — z 2 

This is the generating function (6.117) for Fibonacci numbers, except for a 
missing factor of z in the numerator; so we conclude that the coefficient of z n 
in T is F n+1 . 

The compact representations 1/(1— D—B), 1/(1— D — q 2 ), and 1/(1 —z—z 2 ) 
that we have deduced for T are called generating functions, because they 
generate the coefficients of interest. 

Incidentally, our derivation implies that the number of 2 x n domino 
tilings with exactly m pairs of horizontal dominoes is ( n m m )- (This follows 
because there are j = n — 2m vertical dominoes, hence there are 

(T) - CD - ( n r) 

ways to do the tiling according to our formula.) We observed in Chapter 6 
that ( n m m ) is the number of Morse code sequences of length n that contain 
m dashes; in fact, it’s easy to see that 2 x n domino tilings correspond directly 
to Morse code sequences. (The tiling I HH I H I corresponds to 1 • — ••-•’.) 
Thus domino tilings are closely related to the continuant polynomials we 
studied in Chapter 6. It’s a small world. 

We have solved the T n problem in two ways. The first way, guessing the 
answer and proving it by induction, was easier; the second way, using infinite 
sums of domino patterns and distilling out the coefficients of interest, was 
fancier. But did we use the second method only because it was amusing to 
play with dominoes as if they were algebraic variables? No; the real reason 
for introducing the second way was that the infinite-sum approach is a lot 
more powerful. The second method applies to many more problems, because 
it doesn’t require us to make magic guesses. 
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Let’s generalize up a notch, to a problem where guesswork will be beyond 
us. How many ways U n are there to tile a 3 x n rectangle with dominoes? 

The first few cases of this problem tell us a little: The null tiling gives 
Uq = 1 . There is no valid tiling when n = 1 , since a 2 x 1 domino doesn’t fill 
a 3 x 1 rectangle, and since there isn’t room for two. The next case, n = 2, 
can easily be done by hand; there are three tilings, 0, 0, and 0, so U 2 = 3. 
(Come to think of it we already knew this, because the previous problem told 
us that T 3 = 3; the number of ways to tile a 3 x 2 rectangle is the same as the 
number to tile a 2 x 3.) When n = 3, as when n = 1 , there are no tilings. We 
can convince ourselves of this either by making a quick exhaustive search or 
by looking at the problem from a higher level: The area of a 3 x 3 rectangle is 
odd, so we can’t possibly tile it with dominoes whose area is even. (The same 
argument obviously applies to any odd n.) Finally, when n = 4 there seem 
to be about a dozen tilings; it’s difficult to be sure about the exact number 
without spending a lot of time to guarantee that the list is complete. 

So let’s try the infinite-sum approach that worked last time: 

u = | + @ + 0 + g + 00 + 00 + g@ + g0 + g0+---. ( 7 - 7 ) 

Every non- null tiling begins with either 0 or 0 or 0; but unfortunately the 
first two of these three possibilities don’t simply factor out and leave us with 
U again. The sum of all terms in U that begin with 0 can, however, be written 
as 0 V, where 


is the sum of all domino tilings of a mutilated 3 x n rectangle that has its 
lower left corner missing. Similarly, the terms of U that begin with 0 can be 
written 0 A, where 

A = □ + 03 + 30 + 00 + 03 + • • • 

consists of all rectangular tilings lacking their upper left corner. The series A 
is a mirror image of V. These factorizations allow us to write 

U = | + 0V + 0A + gU. 

And we can factor V and A as well, because such tilings can begin in only 
two ways: 

v = DU + 30V, 

A = □'Ll + g 3 A. 
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Now we have three equations in three unknowns (U, V, and A). We can solve 
them by first solving for V and A in terms of U, then plugging the results 
into the equation for U: 


v = i-^r’DU, A = (l-^r’nU; 

u = H- fc(| -a)- 1 DU + Fif-^r'nU + ill. 

And the final equation can be solved for U, giving the compact formula 


U = 


I - EL(I - ^i)- 1 D - - 


(7-8) 


I learned in another 
class about “regular 
expressions.” If I’m 
not mistaken, we 
can write 

U=(tft‘D 

in the language of 
regular expressions; 
so there must be 
some connection 
between regular 
expressions and gen- 
erating functions. 


This expression defines the infinite sum U, just as (7.4) defines T. 

The next step is to go commutative. Everything simplifies beautifully 
when we detach all the dominoes and use only powers of □ and □: 


1 - D 2 D (1 - a 3 )- 1 - DM 1 - a 3 )- 1 - a 3 
l-o 3 

M - ^ 3 ) z - 2 q 2 ^ 

1 - 2 D 2 d (1 - a 3 )- 2 

1 2D 2 n 4D 4 n 2 8D 6 n 3 

T ^ 3 + (1 - a 3 ) 3 + (1 - a 3 ) 5 + (1 - a 3 ) 7 + "' 


L 


2 k D 2k - k 

(1 - a 3 ) 2 ^ 1 



(This derivation deserves careful scrutiny. The last step uses the formula 
(1 — w)~ 2k_1 = ]T m r+ 2 > m , identity (5.56).) Let’s take a good look at 
the bottom line to see what it tells us. First, it says that every 3 x n tiling 
uses an even number of vertical dominoes. Moreover, if there are 2k verticals, 
there must be at least k horizontals, and the total number of horizontals must 
be k + 3m for some m 0. Finally, the number of possible tilings with 2k 
verticals and k + 3m horizontals is exactly ( m ^ 2k )2 k . 

We now are able to analyze the 3x4 tilings that left us doubtful when we 
began looking at the 3 x n problem. When n = 4 the total area is 1 2, so we 
need six dominoes altogether. There are 2k verticals and k + 3m horizontals, 
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for some k and m; hence 2k + k + 3m = 6. In other words, k + m = 2. 
If we use no verticals, then k = 0 and m = 2; the number of possibilities 
is ( 24 ,°)2 0 = 1. (This accounts for the tiling §e|.) If we use two verticals, 
then k = 1 and m = 1; there are ( 1 ^ 2 )2 1 = 6 such tilings. And if we use 
four verticals, then k = 2 and m = 0; there are ( 0 + 4 )2 2 = 4 such tilings, 
making a total of U4 = 1 1 . In general if n is even, this reasoning shows that 
k + m = jfi, hence ( m ^ n 2k ) = (^/ 2 -k) an< ^ the total number of 3 x n tilings 
is 


U r 




(7-9) 


As before, we can also substitute z for both □ and □, getting a gen- 
erating function that doesn’t discriminate between dominoes of particular 
persuasions. The result is 


U = 


1 — z” 


1 — Z 3 ( 1 — Z 3 1 1 — Z 3 f 1 


1 


1 — 4z 3 + Z 6 


(7-io) 


If we expand this quotient into a power series, we get 

U = 1 + U-2 Z 3 + U4 z 6 + Ug Z 9 + Ug z 1 2 + ■ • • , 


a generating function for the numbers U n . (There’s a curious mismatch be- 
tween subscripts and exponents in this formula, but it is easily explained. The 
coefficient of z 9 , for example, is Ug, which counts the tilings of a 3 x 6 rectan- 
gle. This is what we want, because every such tiling contains nine dominoes.) 

We could proceed to analyze (7.10) and get a closed form for the coeffi- 
cients, but it’s better to save that for later in the chapter after we’ve gotten 
more experience. So let’s divest ourselves of dominoes for the moment and 
proceed to the next advertised problem, “change.” 

How many ways are there to pay 50 cents? We assume that the payment 
must be made with pennies ©, nickels ®, dimes ©, quarters ©, and half- 
dollars ©. George Polya [298] popularized this problem by showing that it 
can be solved with generating functions in an instructive way. 

Let’s set up infinite sums that represent all possible ways to give change, 
just as we tackled the domino problems by working with infinite sums that 
represent all possible domino patterns. It’s simplest to start by working with 
fewer varieties of coins, so let’s suppose first that we have nothing but pennies. 
The sum of all ways to leave some number of pennies (but just pennies) in 
change can be written 


Ah yes, I remember 
when we had half- 
dollars. 


P = /+ ® + @© + ®®@ + ®@®@ + ■ ■ ■ 
= /+® + ® 2 + ® 3 + © 4 + "' . 
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Coins of the realm. 


The first term stands for the way to leave no pennies, the second term stands 
for one penny, then two pennies, three pennies, and so on. Now if we’re 
allowed to use both pennies and nickels, the sum of all possible ways is 

N = P + ® P + ®® P + ®@® P + ®®®@ P H 

= (/ + ® + ® 2 + ® 3 + ® 4 + ---)P, 

since each payment has a certain number of nickels chosen from the first 
factor and a certain number of pennies chosen from P. (Notice that N is 
not the sum /+©+© + (© + ®) 2 + (® + ®) 3 + • • • , because such a 
sum includes many types of payment more than once. For example, the term 
(© + ®) 2 = ©© + ®® + ®® + ®® treats ©® and ®@ as if they were 
different, but we want to list each set of coins only once without respect to 
order.) 

Similarly, if dimes are permitted as well, we get the infinite sum 

D = © + © + © 2 + ® 3 + ® 4 + • • • ) N , 

which includes terms like ® 3 ® 3 © 5 = ©©©©©©©©©©© when it is 
expanded in full. Each of these terms is a different way to make change. 
Adding quarters and then half-dollars to the realm of possibilities gives 

Q — ( ^ + © + © 2 + © 3 + © 4 + • • • ) D ; 

C = ( $ + © + @ 2 + @ 3 + @ 4 h ) Q . 

Our problem is to find the number of terms in C worth exactly 50yf 

A simple trick solves this problem nicely: We can replace © by z, ® 
by z 5 , © by z 10 , © by z 25 , and © by z 50 . Then each term is replaced 
by z n , where n is the monetary value of the original term. For example, 
the term ©@®®@ becomes z 50 + 10 + 5+5+1 = z 71 . The four ways of paying 
13 cents, namely @® 3 , ®@ 8 , ® 2 © 3 , and ® 13 , each reduce to z 13 ; hence 
the coefficient of z 13 will be 4 after the z-substitutions are made. 

Let P n , N n , D n , Q n , and C n be the numbers of ways to pay n cents 
when we’re allowed to use coins that are worth at most 1 , 5, 10, 25, and 50 
cents, respectively. Our analysis tells us that these are the coefficients of z n 
in the respective power series 


p = 

1 + z + z 2 + z 3 + z 4 + • • • , 


N = 

(1 +z 5 +z ,0 + z 15 +z 20 + ---) 

P, 

D = 

(1 +z 10 +z 20 + z 30 + z 40 H 

)N, 

Q = 

(1 +z 25 +z 50 + z 75 +z !00 + .. 

•)D, 

c = 

(1 +z 50 +z1 00 +z 1 50 +z 200 + 

• • • )Q 
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Obviously P n = 1 for all n 0 . And a little thought proves that we have 
N n = [n/ 5 J + 1 : To make n cents out of pennies and nickels, we must choose 
either 0 or 1 or ... or |_rt/ 5 J nickels, after which there’s only one way to supply 
the requisite number of pennies. Thus P n and N n are simple; but the values 
of D n , Q n , and C n are increasingly more complicated. 

One way to deal with these formulas is to realize that 1 + z m + z 2m H — • 
is just 1/(1 — z m ). Thus we can write 

P = 1/(1 -z), 

N = P/(l - z 5 ) , 

D = N/(l -z 10 ), 

Q = D/(l — z 25 ) , 

C = Q/f 1 — z 50 ) . 

Multiplying by the denominators, we have 

d-z)P = 1 , 

(1 -z 5 )N = P, 

(1 — z 10 ) D = N , 

(1 -z 25 )Q = D, 

(1 — z 50 ) C = Q. 

Now we can equate coefficients of z n in these equations, getting recurrence 
relations from which the desired coefficients can quickly be computed: 

P n— i + [n = 0] , 

Nn— 5 T Pn > 

Dn -10 + N n , 

Qn— 25 T D n , 

Cn -50 + Qn • 

For example, the coefficient of z n in D = (1 — z 25 )Q is equal to Q n — Q n -25! 
so we must have Q n — Q n _25 = D n , as claimed. 

We could unfold these recurrences and find, for example, that Q n = 
Dn+D n _25+Dn-50+D n _75 + - • • , stopping when the subscripts get negative. 
But the non-iterated form is convenient because each coefficient is computed 
with just one addition, as in Pascal’s triangle. 

Let’s use the recurrences to find C50. First, C50 = Co + Q50; so we want 
to know Q50. Then Q50 = Q25 + D50, and Q25 = Qo + D25; so we also want 
to know D50 and D25. These D n depend in turn on D40, D30, D20, D15, 
Dio, D5, and on N50, N45, ..., N5. A simple calculation therefore suffices 


Pn = 
Nn = 
D, t = 
Qn = 

C n = 


How many pennies 
are there, really? 

If n is greater 
than, say, 10 10 , 

/ bet that P n = 0 
in the “real world.” 
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(Not counting the 
option of charging 
the tip to a credit 
card.) 


to determine all the necessary coefficients: 


n 

0 

5 

10 

15 

20 

25 

30 

35 

40 

45 

50 

Pn 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Nn 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

D n 

1 

2 

4 

6 

9 

12 

16 


25 


36 

Qn 

1 





13 





49 

C n 

1 










50 


The final value in the table gives us our answer, C 50 : There are exactly 
50 ways to leave a 50-cent tip. 

How about a closed form for C n ? Multiplying the equations together 
gives us the compact expression 


1 -z 1 -z 5 1 -z 10 1 -z 25 1 -z 50 ’ u ' ’ 

but it’s not obvious how to get from here to the coefficient of z n . Fortunately 
there is a way; we’ll return to this problem later in the chapter. 

More elegant formulas arise if we consider the problem of giving change 
when we live in a land that mints coins of every positive integer denomination 
(©, ©, ® , . . . ) instead of just the five we allowed before. The corresponding 
generating function is an infinite product of fractions, 

1 

(1 -z)(1 -Z 2 )(1 -z 3 ) ... ’ 

and the coefficient of z n when these factors are fully multiplied out is called 
p(n), the number of partitions of n. A partition of n is a representation of n 
as a sum of positive integers, disregarding order. For example, there are seven 
different partitions of 5, namely 

5=4+1 = 3 + 2 =3+1+1 =2+2+1 = 2+1 +1 +1 = 1 +1 +1 +1 +1 ; 

hence p(5) = 7. (Also p (2) = 2, p(3) = 3, p(4) = 5, and p (6) =11; it begins 
to look as if p(n) is always a prime number. But p (7) =15, spoiling the 
pattern.) There is no closed form for p(n), but the theory of partitions is a 
fascinating branch of mathematics in which many remarkable discoveries have 
been made. For example, Ramanujan proved that p(5n + 4) = 0 (mod 5), 
p(7n + 5) = 0 (mod 7), and p(11n + 6) =0 (mod 11), by making ingenious 
transformations of generating functions (see Andrews [11, Chapter 10]). 
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7.2 BASIC MANEUVERS 

Now let’s look more closely at some of the techniques that make 
power series powerful. 

First a few words about terminology and notation. Our generic generat- 
ing function has the form 

G(z) = go + giz+ g 2 z 2 H = ^ g n z n , ( 7 . 12 ) 

n^O 

and we say that G(z), or G for short, is the generating function for the se- 
quence (go > 9 1 > 92 >•••), which we also call (g n ). The coefficient g n of z n 
in G(z) is often denoted [z n ] G(z), as in Section 5.4. 

The sum in ( 7 . 12 ) runs over all n 0, but we often find it more con- 
venient to extend the sum over all integers n. We can do this by simply 
regarding g_j = g _2 = • • • = 0. In such cases we might still talk about the 
sequence (go* 9 i > 02, - - - ), as if the Qn’s didn’t exist for negative n. 

Two kinds of “closed forms” come up when we work with generating 
functions. We might have a closed form for G(z), expressed in terms of z; or 
we might have a closed form for g n , expressed in terms of n. For example, the 
generating function for Fibonacci numbers has the closed form z/(l — z — z 2 ); 
the Fibonacci numbers themselves have the closed form (4> n — $ n )/\/5. The 
context will explain what kind of closed form is meant. 

Now a few words about perspective. The generating function G(z) ap- 
pears to be two different entities, depending on how we view it. Sometimes 
it is a function of a complex variable z, satisfying all the standard properties 
proved in calculus books. And sometimes it is simply a formal power series, 
with z acting as a placeholder. In the previous section, for example, we used 
the second interpretation; we saw several examples in which z was substi- 
tuted for some feature of a combinatorial object in a “sum” of such objects. 
The coefficient of z n was then the number of combinatorial objects having n 
occurrences of that feature. 

When we view G(z) as a function of a complex variable, its convergence 
becomes an issue. We said in Chapter 2 that the infinite series )T n>0 g n z n 
converges (absolutely) if and only if there’s a bounding constant A such that 
the finite sums ^ 0 < u <n l 9 nZ n | never exceed A, for any N. Therefore it’s easy 
to see that if Hn>o 9 nZ n converges for some value z = zo, it also converges 
for all z with |z| < |zq|. Furthermore, we must have limn^oo |g n ZQ | = 0; 
hence, in the notation of Chapter 9, g n = 0(|1 /zo| n ) if there is convergence 
at zo. And conversely if g n = 0(M n ), the series 2In>o 9 n. zTl converges for 
all |z| < 1 /M. These are the basic facts about convergence of power series. 

But for our purposes convergence is usually a red herring, unless we’re 
trying to study the asymptotic behavior of the coefficients. Nearly every 


If physicists can get 
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mathematicians 
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mattresses. 


operation we perform on generating functions can be justified rigorously as 
an operation on formal power series, and such operations are legal even when 
the series don’t converge. (The relevant theory can be found, for example, in 
Bell [23], Niven [282], and Henrici [182, Chapter 1].) 

Furthermore, even if we throw all caution to the winds and derive formu- 
las without any rigorous justification, we generally can take the results of our 
derivation and prove them by induction. For example, the generating func- 
tion for the Fibonacci numbers converges only when |z| < 1 /<j) ss 0.618, but 
we didn’t need to know that when we proved the formula F n = ((j) n — $ n )/\/5. 
The latter formula, once discovered, can be verified directly, if we don’t trust 
the theory of formal power series. Therefore we’ll ignore questions of conver- 
gence in this chapter; it’s more a hindrance than a help. 

So much for perspective. Next we look at our main tools for reshaping 
generating functions — adding, shifting, changing variables, differentiating, 
integrating, and multiplying. In what follows we assume that, unless stated 
otherwise, F(z) and G(z] are the generating functions for the sequences (f n ) 
and (g n )- We also assume that the f n ’s and g n ’s are zero for negative n, 
since this saves us some bickering with the limits of summation. 

It’s pretty obvious what happens when we add constant multiples of 
F and G together: 

aF(z) + (3G(z) = a^f n z n + (3 9nZ n 

n n 

= (af n + (3g n )z n . (7.13) 

n 

This gives us the generating function for the sequence (cxf n + |3g n ). 

Shifting a generating function isn’t much harder. To shift G(z) right by 
m places, that is, to form the generating function for the sequence (0, . . . , 0, 
go, gi , . . . ) = (g n -m) with m leading 0’s, we simply multiply by z m : 


z m G(z) = ^g n z n+m = ^g n -mZ E , integer m ^ 0. (7.14) 

n n 

This is the operation we used (twice), along with addition, to deduce the 
equation (1 — z — z 2 )F(z) = z on our way to finding a closed form for the 
Fibonacci numbers in Chapter 6. 

And to shift G(z) left m places — that is, to form the generating function 
for the sequence (g m , g m +i , g m +2, . . . ) = (gn+m) with the first m elements 
discarded — we subtract off the first m terms and then divide by z m : 


G(z)-gp-giz g m _]Z m 1 

z m 


Y_ 9nZ n m = Y- 9Tt+mZ n (7-15) 

nj>m nj>0 


(We can’t extend this last sum over all n unless go = • • • = g m -i = 0.) 
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Replacing the z by a constant multiple is another of our tricks: 

G(cz) = ^g n (cz) n = ^c n g n z n ; (7.16) 

rt n 

this yields the generating function for the sequence (c n g n ). The special case 
c = — 1 is particularly useful. 

Often we want to bring down a factor of n into the coefficient. Differen- 
tiation is what lets us do that: 

G'(z) = gi + 2g 2 z + 3g 3 z 2 + --- = ^(n + 1 )g n+1 z n . (7.17) 

rt 

Shifting this right one place gives us a form that’s sometimes more useful, 
zG'(z) = ^ug n z n . (7.18) 

TL 

This is the generating function for the sequence (ng n ). Repeated differenti- 
ation would allow us to multiply g n by any desired polynomial in n. 
Integration, the inverse operation, lets us divide the terms by n: 

z 11 1 

G(t) dt = g 0 z+ -g!Z 2 + -g 2 z 3 H = Y -g n -iz a . (7.19) 

Jo z 3 n 

(Notice that the constant term is zero.) If we want the generating function 
for (g n /n) instead of (g n _i/n), we should first shift left one place, replacing 
G(t) by (G(t) — g 0 )/t in the integral. 

Finally, here’s how we multiply generating functions together: 

F(z)G(z) = (f 0 + fiz + f 2 z 2 H )(go + giz+ g 2 z 2 -| ) 

= (fogo) + (fogi + fi go)z + (fog2 + fi gi + hgo)z 2 + ■■■ 

= X(X fk0n - k ) zT1 - (7-20) 

rt k 

As we observed in Chapter 5, this gives the generating function for the se- 
quence (Hn), the convolution of (f n ) and (g n ). The sum h n = £I k fkgn.-k 
can also be written h n = Xlk = o fkgn-k> because fk = 0 when k < 0 and 
g n _k = 0 when k > n. Multiplication/convolution is a little more compli- 
cated than the other operations, but it’s very useful — so useful that we will 
spend all of Section 7.5 below looking at examples of it. 

Multiplication has several special cases that are worth considering as 
operations in themselves. We’ve already seen one of these: When F(z) = z m 
we get the shifting operation (7.14). In that case the sum h n becomes the 
single term g n -m, because all fk’s are 0 except for f m = 1. 


I fear d generating- 
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Table 335 Generating function manipulations. 


<xF(z) + |3G(z) 
z m G(z) 

G(z) - g 0 - 9 tz g m _iz m ~ 1 

z m 

G(cz) 

G'(z) 

zG'(z) 

'Z 

G(t) dt 
. o 

F(z)G(z) 


^(af n + (3g n )z n 

n 

Y_ 9^-m Z n , integer m ^ 0 

n 

Y gn+m z n , integer m ^ 0 

n^O 

X. Cn 9nZ n 

n 

^(n + 1)g n+ i z n 


Y n.g n z n 

Tl 



z(5>V 


Another useful special case arises when F(z) is the familiar function 
1/(1— z) = 1 + z + z 2 + • • • ; then all f ^ ’s (for k t> 0) are 1 and we have 
the important formula 

G(z) = X(X 9n - k ) zn = H(X 9k ) zn - (7-2i) 

n 'k^O 7 n k k:gn ' 

Multiplying a generating function by 1/(1 — z) gives us the generating function 
for the cumulative sums of the original sequence. 

Table 335 summarizes the operations we’ve discussed so far. To use 
all these manipulations effectively it helps to have a healthy repertoire of 
generating functions in stock. Table 336 lists the simplest ones; we can use 
those to get started and to solve quite a few problems. 

Each of the generating functions in Table 336 is important enough to 
be memorized. Many of them are special cases of the others, and many of 
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Table 336 Simple sequences and their generating functions, 
sequence generating function 


(1 , 0, 0, 0, 0, 0, ... ) 

( 0 — , 0 , 1 , 0 , 0 ,...) 

( 1 , 1 , 1 , 1 , 1 , 1 ,...) 

( 1 ,- 1 , 1 ,- 1 , 1 ,- 1 ,...) 

(1,0, 1,0, 1,0,...) 

(1 , 0, . . . , 0, 1 , 0, . . . , 0, 1 , 0, . . . ) 
(1,2, 3, 4, 5,6,...) 
(1,2,4,8,16,32,...) 

(1,4,6, 4, 1,0,0,...) 

<l.c, 

(1.C.OTMT).-) 

(1,c,c 2 ,c 3 ,...) 

('.( m 4').( m 4 2 ).( m 4 3 )..") 

(0, 1 , 2> J, " • ) 

( 0 , 1 , — 2 > 3 ’ ~4 ’ • • • ) 

(l, 1, 2> 6> 24’ T20’ ' • • ) 


Y [n = 0] z 

5~ [n = m] : 

^ — n^O 

L. 


-n>0 


y (-i) n z n 

4— n^O 

Y [2\n] z n 

4- — n^O 

X" [m\n] z n 
y (n + 1 ) z 11 

Z — n>0 



closed form 


1 — z 

1 

1 + z 

1 

1 — z 2 

1 

1 -z m 

1 

(T^) 2 

1 

1 — 2z 

(1+z) 4 

(1+z) c 


1 


(1 

-z) c 


1 

V- 

- cz 


1 

(T 

_ z )m+1 

In 

1 


ln(l + z) 
e z 


them can be derived quickly from the others by using the basic operations of 
Table 335; therefore the memory work isn’t very hard. 

For example, let’s consider the sequence (1 , 2, 3, 4, . . . ), whose generating 
function 1/(1 — z) 2 is often useful. This generating function appears near the 


Hint: If the se- 
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middle of Table 336, and it’s also the special case m = 1 of (1 , ( m n ^ 1 ) , ( m n ^ 2 ) , 
("ra 3 )’ • • • )> which appears further down; it’s also the special case c = 2 of 
the closely related sequence (1 , c, ( c 1, 1 ) , ( c (l 2 ) We can derive it from the 
generating function for ( 1 , 1 , 1 , 1 , . . . ) by taking cumulative sums as in ( 7 . 21 ); 
OK, OK, I’m con- that is, by dividing 1 /(I — z) by (1 — z). Or we can derive it from (1 , 1 , 1 , 1 , . . . ) 
vinced already. "by differentiation, using ( 7 . 17 ). 

The sequence (1 , 0, 1 , 0, . . . ) is another one whose generating function can 
be obtained in many ways. We can obviously derive the formula (T n z 2n = 
1/(1 — z 2 ) by substituting z 2 for z in the identity )T n z n = 1/(1 — z); we can 
also apply cumulative summation to the sequence (1,— 1 , 1 ,— 1,...), whose 
generating function is 1/(1 + z), getting 1/(1 +z)(1 — z) = 1/(1 — z 2 ). And 
there’s also a third way, which is based on a general method for extracting 
the even-numbered terms (go, 0, g 2 , 0, g 4 , 0, . . . ) of any given sequence: If we 
add G(— z) to G(+z) we get 

G(z) + G(— z) = Y_ 9n(l + (~1 ) rL )z n = 2 y~ g n [n even]z n ; 


therefore 


G (z) + G(— z) 

2 


X>nZ 2n . 


The odd-numbered terms can be extracted in a similar way, 


G(z) — G(— z) 

2 


X g 2 n+1 Z 2n+1 . 


(7.22) 


(7-23) 


In the special case where g n = 1 and G(z) = 1/(1 —z), the generating function 
for (1,0, 1,0,...) is |(G(z) + G(-z)) = 

Let’s try this extraction trick on the generating function for Fibonacci 
numbers. We know that F n z n = z/(l — z — z 2 ); hence 


Y_ T 2nZ 2n 


2\1-z-z 2 + l+ z-z 2 ) 

1 / z + z 2 — z 3 — z + z 2 + z 3 

2 V (1 -z 2 ) 2 -z 2 


1 — 3z 2 + z 4 


This generates the sequence (Fq, 0, F 2 , 0, F 4 , . . . ); hence the sequence of alter- 
nate F’s, (Fo, F 2 , F 4 , Fg, . . . ) = (0, 1 , 3, 8 , . . . ), has a simple generating function: 

L = |_^ + Z 2 ■ <7-*4) 

TL 
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7.3 SOLVING RECURRENCES 

Now let’s focus our attention on one of the most important uses of 
generating functions: the solution of recurrence relations. 

Given a sequence (g n ) that satisfies a given recurrence, we seek a closed 
form for g n in terms of n. A solution to this problem via generating functions 
proceeds in four steps that are almost mechanical enough to be programmed 
on a computer: 

1 Write down a single equation that expresses g n in terms of other elements 
of the sequence. This equation should be valid for all integers n, assuming 
that g_-| = g_ 2 = ■ ■ ■ = 0. 

2 Multiply both sides of the equation by z n and sum over all n. This gives, 
on the left, the sum ]T n g n z n , which is the generating function G(z). The 
right-hand side should be manipulated so that it becomes some other 
expression involving G(z). 

3 Solve the resulting equation, getting a closed form for G(z). 

4 Expand G(z) into a power series and read off the coefficient of z n ; this is 
a closed form for g n . 

This method works because the single function G(z) represents the entire 
sequence (g n ) in such a way that many manipulations are possible. 

Example 1: Fibonacci numbers revisited. 

For example, let’s rerun the derivation of Fibonacci numbers from Chap- 
ter 6. In that chapter we were feeling our way, learning a new method; now 
we can be more systematic. The given recurrence is 

9o = 0; gi = 1 ; 

9ri = gn -1 + gn — 2 , for n. ^ 2. 

We will find a closed form for g n by using the four steps above. 

Step 1 tells us to write the recurrence as a “single equation” for g n . We 
could say 

( 0, if n ^ 0; 

gn = l 1, if n= 1; 

l 9n— i + gn— 2 > if n > 1; 

but this is cheating. Step 1 really asks for a formula that doesn’t involve a 
case-by-case construction. The single equation 


gn = 9n— 1 + 9n— 2 

works for n 2, and it also holds when n ^ 0 (because we have go = 0 
and g neg ative = 0). But when n = 1 we get 1 on the left and 0 on the right. 
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Fortunately the problem is easy to fix, since we can add [n = 1 ] to the right; 
this adds 1 when n = 1 , and it makes no change when n/1. So, we have 

9n = gn-1 + gn-2 + [n=1] ; 

this is the equation called for in Step 1. 

Step 2 now asks us to transform the equation for (g n ) into an equation 
for G(z) = g n z n . The task is not difficult: 

G(z) = ^g n z n = Y 9n-l Z n + )~ gn-2Z n +^~ [n = 1 ]z n 

n n rt n 

= y 9n Z n+1 + Y_ gn Z n+2 + Z 
tl n 

= zG(z) +z 2 G(z) +z. 


Step 3 is also simple in this case; we have 


G(z) = 



which of course comes as no surprise. 

Step 4 is the clincher. We carried it out in Chapter 6 by having a sudden 
flash of inspiration; let’s go more slowly now, so that we can get through 
Step 4 safely later, when we meet problems that are more difficult. What is 



the coefficient of z n when z/(l — z — z 2 ) is expanded in a power series? More 
generally, if we are given any rational function 


R(z) 


P(z) 

Q(z) ’ 


where P and Q are polynomials, what is the coefficient [z n ] R(z)? 

There’s one kind of rational function whose coefficients are particularly 
nice, namely 


a 

(1 — pz) m+1 


L 

n>0 


m + n 
m 


ap T1 z Tl . 


(7-25) 


(The case p = 1 appears in Table 336, and we can get the general formula 
shown here by substituting pz for z.) A finite sum of functions like (7.25), 


S(z) = 


ai 


a 2 


ai 


(1 — piz) m,+1 (1 — P2Z) m2 + 1 


(1 — p l z) m i+ 1 


(7.26) 
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also has nice coefficients 

[z n ] S(z) = Qi 


/mi + n 

V Tni 


P? + a 2 


m2 + n 

;p2 

m2 
H + Qi 


mi + n 
mi 


Pi • 


(7-27) 


We will show that every rational function R(z) such that R( 0 ) 7^ 00 can be 
expressed in the form 


R(z) = S(z)+T(z), (7.28) 

where S(z) has the form (7.26) and T(z) is a polynomial. Therefore there is a 
closed form for the coefficients [z n ] R(z). Finding S(z) and T(z) is equivalent 
to finding the “partial fraction expansion” of R(z). 

Notice that S(z) = 00 when z has the values 1 /pi , . . . , 1 /pi- Therefore 
the numbers that we need to find, if we’re going to succeed in expressing 
R(z) in the desired form S(z) + T(z), must be the reciprocals of the numbers 
aic where Q(aic) = 0 . (Recall that R(z) = P(z)/Q(z), where P and Q are 
polynomials; we have R(z) = 00 only if Q(z) = 0 .) 

Suppose Q(z) has the form 


Q(z) = qo + qizH b q m z m , where q 0 7b 0 and q m 7b 0 . 


The “reflected” polynomial 

Q r (z) = qoz m + qiz m 1 + • • ■ + q m 


has an important relation to Q(z): 

Q R (z) = q 0 (z — Pi ) . . . (z — Pm) 

\ 1 / Q(z) — Po ( ^ pi z) . . . ( 1 p m z) . 

Thus, the roots of Q R are the reciprocals of the roots of Q, and vice versa. 
We can therefore find the numbers pi< we seek by factoring the reflected poly- 
nomial Q R (z). 

For example, in the Fibonacci case we have 

Q(z) = 1-z-z 2 ; Q R (z) = z 2 z 1 . 


The roots of Q R can be found by setting (a, b, c) = ( 1 , — 1 , — 1 ) in the quad- 
ratic formula (— b ± \/b 2 — 4 ac ) / 2 a; we find that they are 


43 = 


1 + V5 

2 


and $ 


1 -V5 
2 


Therefore Q R (z) = (z— <|))(z— $) an d Q( z ) = (1 — — $z). 
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Once we’ve found the p’s, we can proceed to find the partial fraction 
expansion. It’s simplest if all the roots are distinct, so let’s consider that 
special case first. We might as well state and prove the general result formally: 

Rational Expansion Theorem for Distinct Roots. 

If R(z) = P(z)/Q(z), where Q(z) = q o ( 1 — Pi z) . . . ( 1 — piz) and the 
numbers (pi , . . . , pi) are distinct, and if P(z) is a polynomial of degree less 
than l, then 


[z n ] R(z) = Q! P? H b Qip[\ 


where ak 


-PkP(1/Pk) 

Q'(1/Pk) 


(7-29) 


Proof: Let ai , . . . , cp be the stated constants. Formula ( 7 . 29 ) holds if R(z) = 
P(z)/Q(z) is equal to 

cl , _ ai . . ai 

1 - PlZ 1 - PlZ 

And we can prove that Rfz) = S(z) by showing that the function T(z) = 
R(z) — S(z) is not infinite as z — > 1/pk. For this will show that the rational 
function T(z) is never infinite; hence T(z) must be a polynomial. We also can 
show that T(z) — > 0 as z — > 00 ; hence T(z) must be zero. 

Let ock = 1 /pi<. To prove that lim z ^ Kk T(z) 7 b 00 , it suffices to show that 
lim z ^ ak ( z — cXk)T(z) = 0, because T(z) is a rational function of z. Thus we 
want to show that 


lim (z — cx.k)R(z) = lim (z— ak)S(z) . 

Z— »<X k Z—HXk 


The right-hand limit equals lim z ^ ak Qk(z— ock)/0 — Pk z ) = — cik/Pk! because 
(1 — Pkz) = — Pk(z— ctk) and (z— ak)/(l — Pjz) — > 0 for j 7 ^ k. The left-hand 
limit is 


lim 

Z— *-« k 


(z- a k ) 


P(z) 

Q(z) 


P(ak) lim 


z- a k 

Q(z) 


P(«k) 
Q'(ak) ’ 


by L’Hospital’s rule. Thus the theorem is proved. 

Returning to the Fibonacci example, we have P(z) = z and Q(z) = 
1 — z — z 2 = (1 — <$>z)(l — $z); hence Q'(z) = — 1 — 2z, and 


-pP(Vp) = -1 = P 

Q'fl/p) -1-2/p P + 2 ’ 

According to ( 7 . 29 ), the coefficient of 4> n in [z n ] R(z) is therefore c()/(cf) + 2) = 
1 /s/5; the coefficient of $ n is $/($ + 2) — — 1/\/5. So the theorem tells us 
that F IV = (4> n — $ n )/\/5, as in ( 6 . 123 ). 



342 GENERATING FUNCTIONS 


When Q(z) has repeated roots, the calculations become more difficult, 
but we can beef up the proof of the theorem and prove the following more 
general result: 

General Expansion Theorem for Rational Generating Functions. 

If R(z) = P(z)/Q(z), where Q(z) = qo(l — piz) dl . . . (1 — pi.z) dl and the 
numbers (pi , . . . , pi) are distinct, and if P(z) is a polynomial of degree less 
than di + • • • + dr, then 

[z n ] R(z) = f i (rt) p™ T • • • T fitrOpf 1 for all u ^ 0 , (7.30) 

where each fk(n) is a polynomial of degree dk — 1 with leading coefficient 


O-k = 


-pk) dk P(1/Pk)d k 

Q (dk) (l/Pk) 

P(1/Pk) 


(d k — 1 )! qo rij^kH — Pj/Pk) d ’ ' 

This can be proved by induction on max(di , . . . , dr), using the fact that 


(7-3i) 


Q i (di ^ 1)! aitdi - 1)! 

( 1 Pi z) dl (1-prz) dl 


is a rational function whose denominator polynomial is not divisible by 
( 1 — pkz) dk for any k. 

Example 2: A more-or-less random recurrence. 

Now that we’ve seen some general methods, we’re ready to tackle new 
problems. Let’s try to find a closed form for the recurrence 


go = gi = 1 ; 

gn = gn -1 + 2 g n _2 + (— 1 ) rL , for n ^ 2. (7.32) 

It’s always a good idea to make a table of small cases first, and the recurrence 
lets us do that easily: 


n 

0 

1 

2 

3 

4 

5 

6 

7 

nr 

1 

-I 

1 


1 

-1 

1 

-1 

gn 

1 

1 

4 

5 

14 

23 

52 

97 


No closed form is evident, and this sequence isn’t even listed in Sloane’s 
Handbook [330]; so we need to go through the four-step process if we want 
to discover the solution. 
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Step 1 is easy, since we merely need to insert fudge factors to fix things 
when n < 2: The equation 

9n = gn -1 + 2 g n _2 + (~l) n [n^0] + [n=1] 

holds for all integers n. Now we can carry out Step 2: 

G(z) = = ^g n _ 1 z n +2^9n-2Z n +^(-1) n z n +}I^ 

n n n n^O n = 1 

= zG(z) + 2z 2 G(z) + — ' bz. 

I +Z 

(Incidentally, we could also have used (~ n 1 ) instead of (— 1) n [ri^0], thereby 
getting )z n = (1 Tz) 1 by the binomial theorem.) Step 3 is elementary 

algebra, which yields 

Cl . _ 1 I zjl- z) _ 1 + z + z 2 

(1 +z)(l — z — 2z 2 ) (1 — 2z) ( 1 + z) 2 

And that leaves us with Step 4. 

The squared factor in the denominator is a bit troublesome, since we 
know that repeated roots are more complicated than distinct roots; but there 
it is. We have two roots, pi = 2 and P2 = — 1; the general expansion theorem 
(7-3°) tells us that 

9n = Q i 2 n + (a 2 n + c)(-l) n 

for some constant c, where 

1 + 1/2 + 1/4 _ 7 _ ^ _ 1-1+1 1 

Ql “ (1 + 1/2) 2 ~ 9’ Ul ~ 1 — 2/ ( — 1 ) “ 3' 

(The second formula for in (7.31) is easier to use than the first one when 
the denominator has nice factors. We simply substitute z — 1 /pk everywhere 
in R(z), except in the factor where this gives zero, and divide by (dk — 1)!; 
this gives the coefficient of n dk_1 p£.) Plugging in n = 0 tells us that the 
value of the remaining constant c had better be §; hence our answer is 

g n = |2 n + (jn+ |)(— l) n . (7-33) 

It doesn’t hurt to check the cases n = 1 and 2, just to be sure that we didn’t 
foul up. Maybe we should even try n = 3, since this formula looks weird. But 
it’s correct, all right. 

Could we have discovered (7.33) by guesswork? Perhaps after tabulating 
a few more values we may have observed that g n +i s=s 2g n when n is large. 
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And with chutzpah and luck we might even have been able to smoke out 
the constant |. But it sure is simpler and more reliable to have generating 
functions as a tool. 

Example 3: Mutually recursive sequences. 

Sometimes we have two or more recurrences that depend on each other. 
Then we can form generating functions for both of them, and solve both by 
a simple extension of our four-step method. 

For example, let’s return to the problem of 3 x n domino tilings that we 
explored earlier this chapter. If we want to know only the total number of 
ways, U n , to cover a 3 x n rectangle with dominoes, without breaking this 
number down into vertical dominoes versus horizontal dominoes, we needn’t 
go into as much detail as we did before. We can merely set up the recurrences 

U 0 = 1, U 1 = 0; V o =0, V 1 = l; 

Un = 2V n _! + U n _2 , V n = U n _! + V n _2 , for n ^ 2. 

Here V n is the number of ways to cover a 3 x n rectangle-minus-corner, using 
(3n — 1 )/2 dominoes. These recurrences are easy to discover, if we consider 
the possible domino configurations at the rectangle’s left edge, as before. Here 
are the values of U n and V n for small u: 


n 

0 

1 

2 

3 

4 

5 

6 

7 

U n 

1 

0 

3 

0 

11 

0 

41 

0 

V n 

0 

1 

0 

4 

0 

15 

0 

56 


Let’s find closed forms, in four steps. First (Step 1), we have 
U n = 2V n _i + U n _2 + [n = 0] , V n = U n _i+V n _ 2 , 
for all n. Hence (Step 2), 

U(z) = 2zV(z) + z 2 U(z) + 1 , V(z) = zU(z) + z 2 V(z) . 


Now (Step 3) we must solve two equations in two unknowns; but these are 
easy, since the second equation yields V(z) = zU(z)/(l — z 2 ); we find 


Ufz) 


1 -z 2 

1 — 4z 2 + z 4 ’ 


V(z) 


z 

1 — 4z 2 + z 4 


(7-35) 


(We had this formula for U(z) in ( 7 . 10 ), but with z 3 instead of z 2 . In that 
derivation, n was the number of dominoes; now it’s the width of the rectangle.) 

The denominator 1 — 4z 2 + z 4 is a function of z 2 ; this is what makes 
U 2 n.+i = 0 and V 2 n = 0, as they should be. We can take advantage of this 
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I’ve known slippery 
floors too. 


nice property of z 2 by retaining z 2 when we factor the denominator: We need 
not take 1 — 4z 2 + z 4 all the way to a product of four factors ( 1 — p^z) , since 
two factors of the form (1 — p^z 2 ) will be enough to tell us the coefficients. 
In other words if we consider the generating function 

W(z) = 1 2 = W 0 + Wt z + W 2 z 2 + • •• , (7.36) 

we will have V(z) = zW(z 2 ) and U(z) = (1 — z 2 )W(z 2 ); hence V2 n +i = W n 
and U. 2 n = W n — W n _ 1 . We save time and energy by working with the 
simpler function W(z). 

The factors of 1 — 4z + z 2 are (z — l — y/S ) and (z — 2 + \/3 ), and they can 
also be written (l — (2+y / 3)z) and (l — (2— \/3 )z) because this polynomial 
is its own reflection. Thus it turns out that we have 


Vm+i = W n = ^±^(2 + V3) n + ^-^(2-V3) n ; 

U 2n = W n -W n _, = ^±y^(2 + V3) n + ^fi(2-V3) n 

b b 

_ (2wV3T + (2^V3T 

3 — -s/3 3 + V3 


(7-37) 


This is the desired closed form for the number of 3 x n domino tilings. 

Incidentally, we can simplify the formula for U 2n by realizing that the 
second term always lies between 0 and 1 . The number U 2n is an integer, so 
we have 


U 2n 


'(2W3T' 

3 — V3 


for n ^ 0. 


(7-38) 


In fact, the other term (2 — \/3) n /{3 + \/3) is extremely small when n is 
large, because 2 — \/3 « 0.268. This needs to be taken into account if we 
try to use formula (7.38) in numerical calculations. For example, a fairly 
expensive name-brand hand calculator comes up with 41 3403.0005 when asked 
to compute (2 + a/3 ) 1 °/ ( 3 — \/3). This is correct to nine significant figures; 
but the true value is slightly less than 413403, not slightly greater. Therefore 
it would be a mistake to take the ceiling of 413403.0005; the correct answer, 
U20 = 413403, is obtained by rounding to the nearest integer. Ceilings can 
be hazardous. 


Example 4: A closed form for change. 

When we left the problem of making change, we had just calculated the 
number of ways to pay 50y!. Let’s try now to count the number of ways there 
are to change a dollar, or a million dollars — still using only pennies, nickels, 
dimes, quarters, and halves. 
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The generating function derived earlier is 


C(z) 


11111 

1 — z 1 — z 5 1— z 10 1— z 25 1 — z 50 ’ 


this is a rational function of z with a denominator of degree 91. Therefore 
we can decompose the denominator into 91 factors and come up with a 91- 
term “closed form” for C n , the number of ways to give n cents in change. 
But that’s too horrible to contemplate. Can’t we do better than the general 
method suggests, in this particular case? 

One ray of hope suggests itself immediately, when we notice that the 
denominator is almost a function of z 5 . The trick we just used to simplify 
the calculations by noting that 1 — 4z 2 + z 4 is a function of z 2 can be applied 
to C(z), if we replace 1/(1 — z) by (1 + z + z 2 + z 3 + z 4 )/(l — z 5 ): 

1 + z + z 2 + z 3 + z 4 1 1 1 1 

1 — z 5 1 -z 5 1 -z*° 1 - z 25 1 — z 50 

(1 + z + z 2 + z 3 +z 4 )C(z 5 ) , 

11111 
1 — z 1 — z 1 — z 2 1— z 5 1— z 10 


Now we’re also 
getting compressed 
reasoning. 

C(z) = _ z -| 0) 5 » where A(z) =A 0 + At z+ hA 31 z 31 . (7.39) 

The actual value of A(z), for the curious, is 

(1 +z+---+z 9 ) 2 (1 +z 2 + ---+z 8 )(1 +z 5 ) 

= 1 + 2z + 4z 2 + 6z 3 + 9z 4 + 13z 5 + 18z 6 +24z 7 

+ 31 z 8 + 39z 9 + 45z’ 0 + 52Z 1 1 + 57z ] 2 + 63z' 3 + 67Z 1 4 + 69Z 1 5 
+ 69z 16 + 67z 17 + 63z 18 + 57z 19 + 52z 20 + 45z 21 + 39z 22 + 31 z 23 
+ 24z 24 + 18z 25 + 13z 26 + 9z 27 + 6z 28 + 4z 29 + 2z 30 + z 31 . 


The compressed function C(z) has a denominator whose degree is only 19, 
so it’s much more tractable than the original. This new expression for C(z) 
shows us, incidentally, that C 5n = C 5 n+i = C 5n +2 = C 5n +3 = C 5n .+4; and 
indeed, this set of equations is obvious in retrospect: The number of ways to 
leave a 53/ tip is the same as the number of ways to leave a 50/ tip, because 
the number of pennies is predetermined modulo 5. 

But C(z) still doesn’t have a really simple closed form based on the roots 
of the denominator. The easiest way to compute the coefficients of C(z) is 
probably to recognize that each of the denominator factors is a divisor of 
1 — z 10 . Hence we can write 


C(z) = 

C(z) = 



7.3 SOLVING RECURRENCES 347 


Nowadays peo- 
ple are talking 
femtoseconds. 


Finally, since 1/(1 — z 10 ) 5 = XLk>o ( k 4 4 ) zl0k , we can determine the coefficient 
C n = [z n ] C(z) as follows, when n = lOq + r and 0 ^ r < 10: 

Cio q+ r = ^Aj( k + 4 )[10q+r = 10k + j] 

j,k 

= A r ( q J 4 )+A r+10 ( q t 3 )+A r+ 2o( q J 2 )+A r+ 3o( q J 1 ). (7-4o) 


This gives ten cases, one for each value of r; but it’s a pretty good closed 
form, compared with alternatives that involve powers of complex numbers. 

For example, we can use this expression to deduce the value of Csoq = 
Cl 0q • Then r = 0 and we have 



The number of ways to change 50/ is (®) + 45(([) = 50; the number of ways 
to change $1 is (®) + 45( 3 ) + 52 ( 4 ) = 292; and the number of ways to change 
$1,000,000 is 

/ 2000004/ /2000003\ r /2000002\ /2000001\ 

( 4 M 4 M 4 M 4 ) 

= 66666793333412666685000001 . 


Example 5: A divergent series. 

Now let’s try to get a closed form for the numbers g n defined by 

9o = 1 ; 

gn = rig n _i , for n > 0. 

After staring at this for a few nanoseconds we realize that g n is just n!; in 
fact, the method of summation factors described in Chapter 2 suggests this 
answer immediately. But let’s try to solve the recurrence with generating 
functions, just to see what happens. (A powerful technique should be able to 
handle easy recurrences like this, as well as others that have answers we can’t 
guess so easily.) 

The equation 

gn = ng n _i + [n = 0] 

holds for all n, and it leads to 

G(z) = ^g n z n = Y ng n _! z n + z n . 

n n n— 0 

To complete Step 2, we want to express XLn n 9r4-i ^ terms of G (z), 
and the basic maneuvers in Table 335 suggest that the derivative G'(z) = 
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n 9n zTl 1 is somehow involved. So we steer toward that kind of sum: 

G(z) = 1 + Y (n + 1 )g n z n+1 

n 

= 1 +^ng n z n+1 +^g n z - +1 

n n 

= 1 + z 2 G'(z) + zG(z) . 


Let’s check this equation, using the values of g n for small n. Since 

G = 1 + z + 2z 2 + 6z 3 + 24z 4 + • • • , 

G' = 1 +4z + 1 8z 2 + 96z 3 H , 

we have 

z 2 G' = z 2 + 4z 3 + 18z 4 + 96z 5 + ■ • • , 

zG = z + z 2 + 2z 3 + 6z 4 + 24z 5 + ■ • • , 

1=1. 

These three lines add up to G, so we’re fine so far. Incidentally, we often find 
it convenient to write ‘G’ instead of l G(z)’; the extra ‘(z)’ just clutters up the 
formula when we aren’t changing z. 

Step 3 is next, and it’s different from what we’ve done before because we 
have a differential equation to solve. But this is a differential equation that 
we can handle with the hypergeometric series techniques of Section 5.6; those 
techniques aren’t too bad. (Readers who are unfamiliar with hypergeometrics 
needn’t worry — this will be quick.) 

First we must get rid of the constant ‘1’, so we take the derivative of 
both sides: 

G' = (z 2 G' + zG + 1)' = (2zG'+z 2 G") + (G+zG') 

= z 2 G" + 3zG' + G. 

The theory in Chapter 5 tells us to rewrite this using the operator, and we 
know from exercise 6.13 that 

dG = zG', ff 2 G = z 2 G " + zG ' . 

Therefore the desired form of the differential equation is 

dG = zff 2 G + 2zf>G + zG = z(ff + 1) 2 G. 

According to (5.109), the solution with go = 1 is the hypergeometric series 
RU;;z). 


“This will be quick.” 
That’s what the 
doctor said just 
before he stuck me 
with that needle. 
Come to think of it, 
“hypergeometric ” 
sounds a lot like 
“ hypodermic .” 
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Step 3 was more than we bargained for; but now that we know what the 
function G is, Step 4 is easy — the hypergeometric definition ( 5 . 76 ) gives us 
the power series expansion: 



L 

n^O 


1 n 1 n z n 
n! 


Y_ n! z n 
n^O 


We’ve confirmed the closed form we knew all along, g n — n!. 

Notice that the technique gave the right answer even though G(z) di- 
verges for all nonzero z. The sequence n! grows so fast, the terms |n!z n | 
approach 00 as n — > 00 , unless z — 0. This shows that formal power series 
can be manipulated algebraically without worrying about convergence. 


Example 6: A recurrence that goes all the way back. 

Let’s close this section by applying generating functions to a problem in 
graph theory. A fan of order n is a graph on the vertices {0, 1 , . . . ,n} with 
2n — 1 edges defined as follows: Vertex 0 is connected by an edge to each of 
the other n vertices, and vertex k is connected by an edge to vertex k + 1 , for 
1 ^ k < n. Here, for example, is the fan of order 4, which has five vertices 
and seven edges. 

A 

s r 1 


The problem of interest: How many spanning trees f n are in such a graph? 
A spanning tree is a subgraph containing all the vertices, and containing 
enough edges to make the subgraph connected yet not so many that it has 
a cycle. It turns out that every spanning tree of a graph onn + 1 vertices 
has exactly n edges. With fewer than n edges the subgraph wouldn’t be 
connected, and with more than n it would have a cycle; graph theory books 
prove this. 

There are ( 2n n ^ 1 ) ways to choose n edges from among the 2n— 1 present 
in a fan of order n, but these choices don’t always yield a spanning tree. For 
instance the subgraph 



has four edges but is not a spanning tree; it has a cycle from 0 to 4 to 3 to 0, 
and it has no connection between {1,2} and the other vertices. We want to 
count how many of the ( 2t ^ 1 ) choices actually do yield spanning trees. 
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Let’s look at some small cases. It’s pretty easy to enumerate the spanning 
trees for n = 1,2, and 3: 





A A A A 



f i = 1 f 2 = 3 f 3 = 8 

(We need not show the labels on the vertices, if we always draw vertex 0 at 
the left.) What about the case n = 0? At first it seems reasonable to set 
f o = 1 ; but we’ll take fo = 0, because the existence of a fan of order 0 (which 
should have 2n — 1 = — 1 edges) is dubious. 

Our four-step procedure tells us to find a recurrence for f n that holds 
for all n. We can get a recurrence by observing how the topmost vertex 
(vertex n) is connected to the rest of the spanning tree. If it’s not connected 
to vertex 0, it must be connected to vertex n — 1 , since it must be connected 
to the rest of the graph. In this case, any of the f n _i spanning trees for the 
remaining fan (on the vertices 0 through n — 1 ) will complete a spanning tree 
for the whole graph. Otherwise vertex n is connected to 0, and there’s some 
number k n such that vertices n, n — 1 , . . . , k are connected directly but 
the edge between k and k — 1 is not present. Then there can’t be any edges 
between 0 and {n — 1 , . . . , k}, or there would be a cycle. If k = 1 , the spanning 
tree is therefore determined completely. And if k > 1, any of the f^-i ways 
to produce a spanning tree on {0, 1 , . . . , k— 1 } will yield a spanning tree on the 
whole graph. For example, here’s what this analysis produces when n = 4: 

k = 4 k = 3 k = 2 k=1 



U f 3 f 3 h fi 1 

The general equation, valid for n ^ 1 , is 

fn = fn— 1 + fn — 1 + fn-2 + fn-3 + ' ' ' + fl + 1 . 


(It almost seems as though the ‘ 1 ’ on the end is f o and we should have chosen 
fo = 1 ; but we will doggedly stick with our choice.) A few changes suffice to 
make the equation valid for all integers n: 

+ Y- fk + [ n> °]- (7-4i) 

k <n 


fn = fn— 1 
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This is a recurrence that “goes all the way back” from f n _i through all 
previous values, so it’s different from the other recurrences we’ve seen so far 
in this chapter. We used a special method to get rid of a similar right-side 
sum in Chapter 2, when we solved the quicksort recurrence (2.12); namely, 
we subtracted one instance of the recurrence from another (f n +i — f n ). This 
trick would get rid of the ]T now, as it did then; but we’ll see that generating 
functions allow us to work directly with such sums. (And it’s a good thing 
that they do, because we will be seeing much more complicated recurrences 
before long.) 

Step 1 is finished; Step 2 is where we need to do a new thing: 


F(z) = ^f n z n 

TL 


^f n _ 1 z n + ^f k z n [k<u] + ^[u>0]z n 

n k,n n 

zF(z) + V f k z k )jn >k]z n - k + 

L — L — 1 — z 

k n 


zF(z) + F(z) y~ z m + 

L — 1 — z 

m>0 

zF(z) + F(z) — ^ 1 Z . 

1 — z 1 — z 


The key trick here was to change z n to z k z n ~ k ; this made it possible to 
express the value of the double sum in terms of F(z), as required in Step 2. 
Now Step 3 is simple algebra, and we find 


F(z) 


z 

1 — 3z + z 2 


Those of us with a zest for memorization will recognize this as the generating 
function (7.24) for the even-numbered Fibonacci numbers. So, we needn’t go 
through Step 4; we have found a somewhat surprising answer to the spans- 
of-fans problem: 


fn = F 2n , for u ^ 0. 


(7.42) 


7.4 SPECIAL GENERATING FUNCTIONS 

Step 4 of the four-step procedure becomes much easier if we know 
the coefficients of lots of different power series. The expansions in Table 336 
are quite useful, as far as they go, but many other types of closed forms are 
possible. Therefore we ought to supplement that table with another one, 
which lists power series that correspond to the “special numbers” considered 
in Chapter 6. 
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Table 352 Generating functions for special numbers. 


(1 -z)™+' ln i -z “ 2j Hm+n Hr 

n^O 

* r„^ 

e z - 1 2— n n! 


<T> 


n>0 


F m.Z 


1 - (Fm-1 +F m +i )z + (-1 ) m Z 2 


= 


n>0 


L 


ml k! z k 


k J (1 — z) k+1 


- L 


n^O 


(z- 1 )- = 


( 1 — z) ( 1 — 2z) . . . ( 1 — mz) 


= L 

n^O 


z m = z(z+ 1) . . . (z + m— 1) = y~ 


n^O 

T1 I Z 


n^O 


(«■-,)- IX 

l \ m 


In 


1 — z 


= m! Y_ 


ln(1 + z) 


1 — e _ 


n^O 

z~ 


Z 

rtT 


= y m 

z — n! I m— n 
n^O k 


r, I L 

= 7 ~ 

Til 


n>0 


= L 


m,n^0 


m 

m— ri 


Tl 


m— 1 
n 

m— 1 
n 


P w(e z 


1 


11 = L 


m,n^sO 


W — - 

m/ n! 


n 

w" L — 
m I n! 


(1 -z) 
1 — w 


= L 


„ , m, 

m,n^0 L J 


n 


W 


U! 


= y / n \w m — 

e (w-i)z_ w Z_ \ra/ n! 

m,n^0 ' 


(7-43) 

(7-44) 

(7-45) 

(7-46) 

(7-47) 

(7-48) 

(7-49) 

(7-5o) 

(7-5i) 

(7-52) 

(7-53) 

(7-54) 

(7-55) 

(7-56) 
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Table 352 is the database we need. The identities in this table are not 
difficult to prove, so we needn’t dwell on them; this table is primarily for 
reference when we meet a new problem. But there’s a nice proof of the first 
formula, (7.43), that deserves mention: We start with the identity 


1 

(1 -z)* +1 



and differentiate it with respect to x. On the left, (1 — z) 


— X — 1 


is equal to 


, so d/ dx contributes a factor of ln(l / (1 — z)) . On the right, 


= H x 


— H x 


e (x+1)ln(1/(1-z) 

the numerator of ( x ^ n ) is (x + n) . . . (x+ 1 ), and d/dx splits this into n terms 
whose sum is equivalent to multiplying ( x + n ) by 

1 1 

h • • • H - 

x + n x + 1 

Replacing x by m gives (7.43). Notice that H x+n — H x is meaningful even 
when x is not an integer. 

By the way, this method of differentiating a complicated product — leav- 
ing it as a product — is usually better than expressing the derivative as a sum. 
For example the right side of 


— ((x+n) n ...(x+l) 1 ) = (x+n) n . . . (x+1 J 1 


n 


1 


x+n 


x+1 


would be a lot messier written out as a sum. 

The general identities in Table 352 include many important special cases. 
For example, (7.43) simplifies to the generating function for H n when m = 0 : 

1 


1 — z 


In 


O- = 

1 — z L — 


( 7 - 57 ) 


This equation can also be derived in other ways; for example, we can take the 
power series for In (1/(1 — z)) and divide it by 1 — z to get cumulative sums. 

Identities (7.51) and (7.52) involve the respective ratios { m m n }/( m T 7 1 ) 
and [ m m n ]/( m n ^ 1 )i which have the undefined form 0/0 when n + m. However, 
there is a way to give them a proper meaning using the Stirling polynomials 
of (6.45), because we have 



/ 


m — 1 
n 


m 

m u 


/ 


m — 1 
n 


(— l) n+, n! mcr n (n— m) ; 
n! mcr n (m) . 


( 7 - 58 ) 

( 7 - 59 ) 


Thus, for example, the case m = 1 of (7.51) should not be regarded as the 
power series 2Z n ^ o ( zTL / ri 0{i_ n }/ (°), but rather as 

= - Y (-z) n cr n (n - 1 ) = 1 + \z - jjz 2 + • • • . 
n^O 


ln(l + z) 
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Identities (7.53), (7-55), (7-54), and (7.56) are “double generating func- 
tions” or “super generating functions” because they have the form G(w,z) = 
H m n The coefficient of w m is a generating function in the 

variable z; the coefficient of z n is a generating function in the variable w. 
Equation (7.56) can be put into the more symmetrical form 

e w — e z 
we z — ze w 


= L 


m 


(m + n+ 1)! 


(7.60) 


7.5 CONVOLUTIONS 


The convolution of two given sequences (f 0 , f 1 . - - - ) = (fn) and 
(g 0 ,gi,...) = (g n ) is the sequence (f 0 go, Fogi +figo, •••) = (Lk f k9n-k)- 
We have observed in Sections 5.4 and 7.2 that convolution of sequences cor- 
responds to multiplication of their generating functions. This fact makes it 
easy to evaluate many sums that would otherwise be difficult to handle. 


1 always thought 
convolution was 
what happens to 
my brain when I 
try to do a proof. 


Example 1: A Fibonacci convolution. 

For example, let’s try to evaluate 2Z£ =0 FkF n -k in closed form. This is 
the convolution of (F n ) with itself, so the sum must be the coefficient of z n 
in F(z) 2 , where F(z) is the generating function for (F n ). All we have to do is 
figure out the value of this coefficient. 

The generating function F(z) is z/(l — z— z 2 ), a quotient of polynomials; so 
the general expansion theorem for rational functions tells us that the answer 
can be obtained from a partial fraction representation. We can use the general 
expansion theorem (7.30) and grind away; or we can use the fact that 


F(z) 


2 


!_ 

V5 V 1 “ <S> Z 1 - $ z 


5\(1— (t>z) 2 (1 — ct>z)(1 — $z) (1 — $z ) 2 ) 

= 1 £(n + 1)4> n z n F -+^ n + l^(n+ • 

n^O nJiO n^O 

Instead of expressing the answer in terms of 4> and $, let’s try for a closed 
form in terms of Fibonacci numbers. Recalling that 4> + $ = 1 , we have 


+ $ n = fc n ] ( — + 


l 


1 — 4)Z 1 — $z 

2 — (<jj + $)z 
(1 - 4»z)(l - $z) 


= fe n ] . 2 Z 2 = 2F n+1 — F n . 
I — z — Z z 
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Hence 

F(z) 2 = l^(n + 1)(2F n+1 -F n )z n -^F n+1 z n ( 

^ rt^O ^ n^O 

and we have the answer we seek: 


X. p kFn-k 

k=0 


2 n.F n+ i - (n+ 1 )F n 

5 


( 7 - 6 i) 


For example, when n = 3 this formula gives F0F3 + F1F2 + F2F1 + F3F0 = 
0 + 1 + 1+0 = 2 on the left and (6F4 — 4F3)/5 = (18 — 8)/5 = 2 on the right. 

Example 2 : Harmonic convolutions. 

The efficiency of a certain computer method called “samplesort” depends 
on the value of the sum 


1 m,n 



1 

n-k ’ 


integers m, n 0. 


Exercise 5.58 obtains the value of this sum by a somewhat intricate double 
induction, using summation factors. It’s much easier to realize that T m n is 
just the nth term in the convolution of {(^J , ( ) , ( ) , . . . ) with (0, |, j , . . . }. 
Both sequences have simple generating functions in Table 336 : 


L 

nj>0 



0 


_ ,lm+1 


Therefore, by (7.43), 



n>0 


Tm,n — l z ] ,, i m +l 1 

( 1 — z) m 1 1 — z 


1 


In 


1 


= (H n -H 


(1-z) m+1 1 — z 

n 

n — mi 


In fact, there are many more sums that boil down to this same sort of 
convolution, because we have 

1 1 1 . 1 = 1 , 1 

( 1 — z) r+1 n f— z (1 — z) s + 1 (1 — z) r+s + 2 n i— z 


for all r and s. Equating coefficients of z n gives the general identity 


L 


r + k\ /s + n — k 

k J V n-k 

r + s + n + 1 
n 


(H r+k -H r ) 

(Ffr+s+n+1 Hr+s+1 


(7.62) 
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This seems almost too good to be true. But it checks, at least when n = 2 : 


r+1\/s + 1\ 1 

1 


1 / r + 1 


r + 2\ /s + 0 


2 

r + s + 3 
2 


1 


1 


r + 2 r + 1 

1 1 

+ 


r+s +3 t+s+2 


Special cases like s = 0 are as remarkable as the general case. 
And there’s more. We can use the convolution identity 


L 


r + k\/s + n — k 
n — k 


r + s + n + 1 
n 


to transpose H r to the other side, since H r is independent of k: 


L 


r + k\/s + n — k' 


H 


n-k 
r + s + n + 1 
n 


r+k 


(Hr+s+n+1 Hr+s + 1 + H r ) . 


( 7 - 63 ) 


There’s still more: If r and s are nonnegative integers l and m, we can replace 
( r + k ) by ( lH l k ) and ( s + n ~ k ) by ( m+ ™~ k ); then we can change k to k — l and 
n to n — m — l, getting 


£ 


k=0 


n-k' 

m 


Hv = 


u+1 
l + m + 1 


(H n+ i — H l+m+1 + H;) , 
integers l, m, n ^ 0. (7.64) 


Even the special case l = m = 0 of this identity was difficult for us to handle 
in Chapter 2 ! (See (2.36).) We’ve come a long way. 

Example 3: Convolutions of convolutions. 

If we form the convolution of (f n ) and (g n ), then convolve this with a 
third sequence (tVn.) , we get a sequence whose nth term is 

Y fj 9k hi . 

j+k+l=n 

The generating function of this three-fold convolution is, of course, the three- 
fold product F(z) G(z)H(z). In a similar way, the m-fold convolution of a 
sequence (g n ) with itself has nth term equal to 


Y 9ki gk 2 

ki +k 2 H hk m =n 


9k„ 


Because it’s so 
harmonic. 


and its generating function is G(z) m . 
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Concrete blocks. 


We can apply these observations to the spans-of-fans problem considered 
earlier (Example 6 in Section 7 . 3 ). It turns out that there’s another way to 
compute f n , the number of spanning trees of an ri-fan, based on the config- 
urations of tree edges between the vertices {1,2,... ,n}: The edge between 
vertex k and vertex k+ 1 may or may not be selected for the tree; and each of 
the ways to select these edges connects up certain blocks of adjacent vertices. 
For example, when n = 1 0 we might connect vertices {1 , 2}, { 3 }, {4, 5, 6, 7 }, and 
{ 8 , 9 , 10 }: 


0 * 

How many spanning trees can we make, by adding additional edges to ver- 
tex 0? We need to connect 0 to each of the four blocks; and there are two 
ways to join 0 with {1,2}, one way to join it with {3}, four ways with {4, 5, 6, 7}, 
and three ways with {8,9, 10}, or 2 - 1 - 4-3 =24 ways altogether. Summing 
over all possible ways to make blocks gives us the following expression for the 
total number of spanning trees: 

fn= Z Z kik 2 ...k m . (7.65) 

m>0 ki+k2H bk m =n 

ki ,k2 ,...,k m >0 

For example, f 4 = 4 + 3- 1 + 2-2 + 1 -3 + 2- M + 1 -2- 1 + 1 • 1 -2 + 1 ■ 1-1 • 1 = 21 . 

This is the sum of m-fold convolutions of the sequence (0, 1 , 2, 3, . . . ), for 
m = 1 , 2, 3, . . . ; hence the generating function for (f n ) is 

F(z) = G(z) + G(z) 2 + G(z) 3 -| = G ^] 

1 - G(z) 

where G(z) is the generating function for (0, 1 , 2, 3, . . . ), namely z/(1 — z) 2 . 
Consequently we have 


(1 — z) 2 — z 1— 3z + z 2 ’ 

as before. This approach to (f n ) is more symmetrical and appealing than the 
complicated recurrence we had earlier. 


10 

9 
8 
7 
6 
5 
4 
• 3 


If 
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Example 4: A convoluted recurrence. 

Our next example is especially important. In fact, it’s the “classic exam- 
ple” of why generating functions are useful in the solution of recurrences. 

Suppose we have n + 1 variables Xo, X] , . . . , x n whose product is to be 
computed by doing n multiplications. How many ways C n are there to insert 
parentheses into the product Xo -Xi ■ . . . -x n so that the order of multiplication is 
completely specified? For example, when n = 2 there are two ways, Xo • (xi -x 2 ) 
and (xo • X] ) • x 2 . And when n = 3 there are five ways, 


X 0 -(Xi -(x 2 -X 3 )) , Xo'((XrX 2 )-X 3 ), (x 0 -*i ) • (x 2 -x 3 ) , 

(x 0 -(xi -X 2 ))-X 3 , ( (x 0 • Xi ) • X 2 ) • X 3 . 

Thus C 2 = 2, C 3 = 5; we also have Ci = 1 and Co = 1. 

Let’s use the four-step procedure of Section 7.3. What is a recurrence 
for the C’s? The key observation is that there’s exactly one 1 • ’ operation 
outside all of the parentheses, when n > 0 ; this is the final multiplication 
that ties everything together. If this 1 • ’ occurs between Xk and Xk+i, there 
are Ck ways to fully parenthesize Xo • . . . -Xk, and there are C n -k-i ways to 
fully parenthesize Xk+i • . . . - x n ; hence 

C n = CoCn-l + Cl C n—2 + ■ • • + Cn — 1 Co , if tv > 0. 

By now we recognize this expression as a convolution, and we know how to 
patch the formula so that it holds for all integers n: 

C n = ^CkCn-l-k + [n = 0 ]. ( 7 . 66 ) 

k 

Step 1 is now complete. Step 2 tells us to multiply by z u and sum: 

C(z) = ^ C nZ U 

n 

= 21 C k Cn-l-kZ n + y Z n 

k,n n — 0 

= 2ICkZ k 2ICn-l-kZ n - k + 1 

k n 

= C(z) • zC(z) + 1 . 

Lo and behold, the convolution has become a product, in the generating- 
function world. Life is full of surprises. 


The authors jest. 
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So the convo- 
luted recurrence 
has led us to an 
oft-recurring con 
volution. 


Step 3 is also easy. We solve for C(z) by the quadratic formula: 


C(z) 


1 ± 

2z 


But should we choose the + sign or the — sign? Both choices yield a function 
that satisfies C(z) = zC(z) 2 + 1 , but only one of the choices is suitable for our 
problem. We might choose the + sign on the grounds that positive thinking 
is best; but we soon discover that this choice gives C(0) = oo, contrary to 
the facts. (The correct function C(z) is supposed to have C(0) = Co = 1.) 
Therefore we conclude that 


C(z) 


1 - a/T^4z 
2z 


Finally, Step 4. What is [z n ] C(z)? The binomial theorem tells us that 


Cl -4z = Y_ 

k^O 

hence, using (5.37), 

1-/T^4z = 

2z 


1 / 2 ' 

k 


k^1 x 7 


L 

k^1 

L 

n>0 


1 M/2 


kVk- 1 


-4z) 


k— 1 


— l/2\ [~4z y 


n 7 n+ 1 


= L 

n>0 


2n\ z 11 


n / n+ 1 


The number of ways to parenthesize, C n , is 

We anticipated this result in Chapter 5, when we introduced the sequence 
of Catalan numbers (1 , 1 , 2, 5, 14, ... ) = (C n ). This sequence arises in dozens 
of problems that seem at first to be unrelated to each other [46], because 
many situations have a recursive structure that corresponds to the convolution 
recurrence (7.66). 

For example, let’s consider the following problem: How many sequences 
(cii , Q2 . . . , Q2n) of +1 ’s and —1 ’s have the property that 


Qi + 0.2 H h 02n = 0 


and have all their partial sums 


ai , a] + a 2 , • • • , ai + a 2 H h a 2n 

nonnegative? There must be n occurrences of +1 and n occurrences of — 1. 
We can represent this problem graphically by plotting the sequence of partial 
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sums s n = XLk =1 a k as a function of n: The five solutions for n = 3 are 

Aa , AAA. 

These are “mountain ranges” of width 2 n that can be drawn with line seg- 
ments of the forms / and\. It turns out that there are exactly C n ways to 
do this, and the sequences can be related to the parenthesis problem in the 
following way: Put an extra pair of parentheses around the entire formula, so 
that there are n pairs of parentheses corresponding to the n multiplications. 
Now replace each 1 •’ by +1 and each by —1 and erase everything else. 
For example, the formula xo • ((xi -xj)- (X 3 -x^) corresponds to the sequence 
(+1 , +1 , — 1 , +1 , +1 , — 1 , — 1 , — 1) by this rule. The five ways to parenthesize 
Xo -X] • X 2 • X 3 correspond to the five mountain ranges for n = 3 shown above. 

Moreover, a slight reformulation of our sequence-counting problem leads 
to a surprisingly simple combinatorial solution that avoids the use of gener- 
ating functions: How many sequences (do, oi , 02 , ■ . . , Q 2 n) of +1’s and — 1’s 
have the property that 

do + CL] + 02 + ■ • • + d2n = 1 , 

when all the partial sums 

O 0 , do + Q 1 i 00 + 01 + 02 , . . . , do + Oj + • • • + d 2 n 

are required to be positive ? Clearly these are just the sequences of the pre- 
vious problem, with the additional element do = +1 placed in front. But 
the sequences in the new problem can be enumerated by a simple counting 
argument, using a remarkable fact discovered by George Raney [302] in 1959: 
If (xi ,X 2 , . . . ,x m ) is any sequence of integers whose sum is +1, exactly one 
of the cyclic shifts 

(xi ,x 2 , . . . , ). <*2 , • • • , x m Ai ), ..., (x m , Xi , . . . , x m — i ) 

has all of its partial sums positive. For example, consider the sequence 
(3, —5, 2, —2, 3, 0). Its cyclic shifts are 

(3, -5, 2, -2, 3,0) (-2, 3, 0,3, -5, 2) 

(-5, 2, -2, 3, 0,3) (3,0,3,— 5,2,— 2) + 

(2, -2, 3, 0,3, -5) (0,3, -5, 2, -2, 3) 

and only the one that’s checked has entirely positive partial sums. 
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Raney’s lemma can be proved by a simple geometric argument. Let’s 
extend the sequence periodically to get an infinite sequence 

(xi ,x 2 , . . . ,x m ,xi ,x 2> . . . ,x m ,xi ,x 2> .. .) ; 

thus we let x m+ )c = Xk for all k ^ 0. If we now plot the partial sums 
Sn = xi H — • + x n as a function of n, the graph of s n has an “average slope” 
of 1/m, because s m+n = s n + 1. For example, the graph corresponding to 
our example sequence (3, —5, 2, —2, 3, 0, 3, —5, 2, . . . ) begins as follows: 



Ah, if stock prices 
would only continue 
to rise like this. 


(Attention, com- 
puter scientists: 

The partial sums 
in this problem 
represent the stack 
size as a function of 
time, when a prod- 
uct of n+ 1 factors 
is evaluated, be- 
cause each “push" 
operation changes 
the size by +1 and 
each multiplication 
changes it by —1 .) 


The entire graph can be contained between two lines of slope 1 /m, as shown; 
we have m = 6 in the illustration. In general these bounding lines touch the 
graph just once in each cycle of m points, since lines of slope 1 /m hit points 
with integer coordinates only once per m units. The unique lower point of 
intersection is the only place in the cycle from which all partial sums will 
be positive, because every other point on the curve has an intersection point 
within m units to its right. 

With Raney’s lemma we can easily enumerate the sequences (do, . . . , d 2 n ) 
of Tl’s and — 1 ’s whose partial sums are entirely positive and whose total 
sum is +1. There are ( 2r ^ l fl ) sequences with n occurrences of —1 and n + 1 
occurrences of +1, and Raney’s lemma tells us that exactly 1/(2n + 1) of 
these sequences have all partial sums positive. (List all N = ( 2n n Fl ) of these 
sequences and all 2n + 1 of their cyclic shifts, in an N x (2 n + 1 ) array. Each 
row contains exactly one solution. Each solution appears exactly once in each 
column. So there are N/(2n+l ) distinct solutions in the array, each appearing 
(2n + 1 ) times.) The total number of sequences with positive partial sums is 


/2n+1\ 1 

V n ) In + 1 


2 H 


n / n - 


= C n . 


Example 5: A recurrence with m-fold convolution. 

We can generalize the problem just considered by looking at sequences 
(do, . . . , d mn ) of +1’s and (1 — m)’s whose partial sums are all positive and 
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whose total sum is +1. Such sequences can be called m -Raney sequences. If 
there are k occurrences of (1 — m) and mn+ 1 — k occurrences of +1 , we have 


k(l — m) + (mn + 1 — k) = 1, 


hence k = n. There are ( mT ^ +1 ) sequences with n occurrences of (1 — m) and 
mn + 1 — n occurrences of +1 , and Raney’s lemma tells us that the number 
of such sequences with all partial sums positive is exactly 

f mn + 1 \ 1 f mn\ 1 , . 

\ n j mn +1 \ n ) (m — 1 )n + 1 ^ ^ 

So this is the number of m-Raney sequences. Let’s call this a Fuss-Catalan 
number Cn™', because the sequence (Cn™') was first investigated by N.I. 
Puss [135] in 1791 (many years before Catalan himself got into the act). The 
ordinary Catalan numbers are C n = Clf*. 

Now that we know the answer, (7.67), let’s play “Jeopardy” and figure 
out a question that leads to it. In the case m = 2 the question was: “What 
numbers C n satisfy the recurrence C n = CkC n -i-k + [n = 0]?” We will 
try to find a similar question (a similar recurrence) in the general case. 

The trivial sequence (+1) of length 1 is clearly an m-Raney sequence. If 
we put the number (1 — m) at the right of any m sequences that are m-Raney, 
we get an m-Raney sequence; the partial sums stay positive as they increase 
to +2, then +3, . . . , +m, and +1 . Conversely, we can show that all m-Raney 
sequences (do, . . . , a mn ) arise in this way, if n > 0: The last term a m n must 
be (1 — m). The partial sums Sj = Qo + ■ ■ • + dj_i are positive for 1 ^ j ^ mn, 
and s mn = m because s mn + a mn = 1 . Let ki be the largest index ^ mn such 
that Sk, = 1; let k2 be largest such that Sk 2 = 2; and so on. Thus Sk, = j and 
Sk > j, for kj < k ^ mn and 1 ^ j ^ m. It follows that k m = mn, and we 
can verify without difficulty that each of the subsequences (do, . . . , dk, -1 ), 
(d kl , . . . , a k2 -i), . . . , (d km _, , • • • , a k m -i ) is an m-Raney sequence. We must 
have k] = mui + 1 , k 2 — k] = mn 2 + 1 , . . . , k m — k m _! = mn m + 1 , for 
some nonnegative integers ni , n 2 , . . . , n m . 

Therefore ( ml ) L +1 ) mr | +1 is the answer to the following two interesting 
questions: “What are the numbers Cn" 1 ' defined by the recurrence 


( Attention , com- 
puter scientists: 
The stack interpre- 
tation now applies 
with respect to an 
m-ary operation, 
instead of the bi- 
nary multiplication 
considered earlier.) 


p(m) / V" p(m)p(m) p{m) 

'-n I /_ mi ri£ • • • '-n m 

\ni +n 2 H hn m =n — 1 

for all integers n?” “If G(z) is a power series that satisfies 

G(z) = zG(z) m + 1, 



[n = 0] 


( 7 . 68 ) 

(7-69) 


what is [z n ] G(z)? : 
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Notice that these are not easy questions. In the ordinary Catalan case 
(m = 2), we solved (7.69) for G(z) and its coefficients by using the quadratic 
formula and the binomial theorem; but when m = 3, none of the standard 
techniques gives any clue about how to solve the cubic equation G = zG 3 + 1 . 
So it has tinned out to be easier to answer this question before asking it. 

Now, however, we know enough to ask even harder questions and deduce 
their answers. How about this one: “What is [z n ] G(z) 1 , if l is a positive 
integer and if G(z) is the power series defined by (7.69)?” The argument we 
just gave can be used to show that [z n ] G(z) 1 is the number of sequences of 
length ran + l with the following three properties: 

• Each element is either +1 or (1 — m). 

• The partial sums are all positive. 

• The total sum is l. 

For we get all such sequences in a unique way by putting together l sequences 
that have the m-Raney property. The number of ways to do this is 

X. C^C^.-.C™ = [z n ]G(z) 1 . 

Tti +ri2H hni=n 


Raney proved a generalization of his lemma that tells us how to count 
such sequences: If (xi , X 2 , . . . , x m ) is any sequence of integers with Xj ^ 1 for 
all j, and with xi + X2 + • • ■ + x m = l > 0 , then exactly l of the cyclic shifts 

(xi ,x 2 , . . . , ), <*2 , . . . , X m , Xi ) , . . . , (x m , Xl , . . . , X m — 1 ) 


have all positive partial sums. 

For example, we can check this statement on the sequence (—2, 1 , — 1 , 0, 
1 , 1 , — 1 , 1 , 1 , 1 ). The cyclic shifts are 


(- 2 , 1 ,- 1 , 0 , 1 , 1 ,- 1 , 1 , 1 , 1 } 
( 1 ,- 1 , 0 , 1 , 1 , - 1 , 1 , 1 , 1 , - 2 ) 
(-1,0,1, 1,-1, 1,1, 1,-2, 1) 
( 0 , 1 , 1 ,- 1 , 1 , 1 , 1 ,- 2 , 1 ,- 1 ) 
(1,1, -1,1, 1,1, -2,1, -1,0) V 


(1,-1, 1,1, 1,-2, 1,-1, 0,1) 
(- 1 , 1 , 1 , 1 , - 2 , 1 ,- 1 , 0 , 1 , 1 ) 
(1,1, 1,-2, 1,-1, 0,1, 1,-1) V 
(1,1, -2, 1,-1, 0,1,1, -1,1) 
( 1 ,- 2 , 1 ,- 1 , 0 , 1 , 1 ,- 1 , 1 , 1 ) 


and only the two examples marked V’ have all partial sums positive. This 
generalized lemma is proved in exercise 13. 

A sequence of +1 ’s and (1 — m)’s that has length mn + 1 and total sum l 
must have exactly n occurrences of ( 1 — m) . The generalized lemma tells 
us that l/(mn + l) of these ( Tm ( l +l ) sequences have all partial sums positive; 
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hence our tough question has a surprisingly simple answer: 


[z n ]G(z ) x 


( m r) 


i 

mn + 1 ’ 


(7-7o) 


for all integers l > 0. 

Readers who haven’t forgotten Chapter 5 might well be experiencing deja 
vu: “That formula looks familiar; haven’t we seen it before?” Yes, indeed; 
Lambert’s equation (5.60) says that 


[z n ] ‘B t (z) T 


CT) 


T 

tn + r ' 


Therefore the generating function G(z) in (7.69) must actually be the gener- 
alized binomial series ‘B m (z). Sure enough, equation (5.59) says 

® m (z) 1 - m -® m (z)- m = z, 


which is the same as 


'Bm(z) - 1 = Z® m (z) m . 


Let’s switch to the notation of Chapter 5, now that we know we’re dealing 
with generalized binomials. Chapter 5 stated a bunch of identities without 
proof. We have now closed part of the gap by proving that the power series 
“Bt (z) defined by 


®t(z) 



/tn + 1 \ z n 
\ n / tn + 1 


has the remarkable property that 
^tn + r\ r z n 


®t(z) r = y . 


n 


tn + r 


whenever t and r are positive integers. 

Can we extend these results to arbitrary values of t and r? Yes; because 
the coefficients ( tT1 n f r ) tn r +r are polynomials in t and r. The general rth power 
defined by 


*B t (z) T = e rln23t(z) 


=z 

n />0 


(r ln‘B t (z)) n 
n! 



y (1 'Bt(z)) \ 


has coefficients that are polynomials in t and r; and those polynomials are 
equal to ( tT1 n f r ) tn T +r for infinitely many values of t and r. So the two sequences 
of polynomials must be identically equal. 
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Chapter 5 also mentions the generalized exponential series 

£,(z) = X (tn+ l )n ~V . 


nSO 


n! 


which is said in (5.60) to have an equally remarkable property: 
r(tn + r) n_1 


[z n ] £ t ( z ) r = 


n! 


(7-7i) 


We can prove this as a limiting case of the formulas for ‘Bt(z), because it is 
not difficult to show that 

£ t (z) r = lim ‘B xt (z/x) xr . 


Are we having 
fun yet? 


7.6 EXPONENTIAL GF’S 

Sometimes a sequence (g n ) has a generating function whose proper- 
ties are quite complicated, while the related sequence (g n /n!) has a generating 
function that’s quite simple. In such cases we naturally prefer to work with 
(g Tt /n!) and then multiply by n! at the end. This trick works sufficiently 
often that we have a special name for it: We call the power series 

G(z) = Y_ (7-72) 

TV^O 


the exponential generating function or “egf” of the sequence (go, 9i , 92> • - - }- 
This name arises because the exponential function e z is the egf of (1,1,1,...). 

Many of the generating functions in Table 352 are actually egf’s. For 
example, equation (7.50) says that (lny^) m /m! is the egf for the sequence 
([m] > [m] ’ tm] >•••)■ The ordinary generating function for this sequence is 
much more complicated (and also divergent). 

Exponential generating functions have their own basic maneuvers, analo- 
gous to the operations we learned in Section 7.2. For example, if we multiply 
the egf of (g n ) by z, we get 


9r 

n>0 


r n+l 


n! 


= Ls 


n- 1 


n>^ 


In— 1)! 


= Z n 9-i 


n^O 


this is the egf of (0, g 0 ,2 gi , . . .) = (ng n _i). 

Differentiating the egf of (go, gi , g2, ■ • • ) with respect to z gives 


Y_ U 9r 

n>0 


_n— 1 


= X- 9t 


-TL— 1 


1 


(n — 11! 


X- 9n+i n i ’ 


n>0 


n! 


(7-73) 
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this is the egf of (g i Thus differentiation on egf’s corresponds to the 
left-shift operation (G(z) — go)/ z on ordinary gf’s. (We used this left-shift 
property of egf’s when we studied hypergeometric series, (5.106).) Integration 
of an egf gives 


Y_ Sn 

0 n^O 


t n 

u! 


dt = Y g n 

n^O 


Z n+1 

(n+1)! 


Y_ 9n-1 
n^1 



(7-74) 


this is a right shift, the egf of (0, go, gi , . . . ). 

The most interesting operation on egf’s, as on ordinary gf’s, is multipli- 
cation. If F(z) and G(z) are egf’s for (f n ) and (gn), then F(z)G(z) = H(z) 
is the egf for a sequence (Fin) called the binomial convolution of (f n ) and 
(9n>: 


Hn = Y_ ( ^Jfkgn-k- ( 7 - 75 ) 

Binomial coefficients appear here because (£) = n!/k! (n — k)!, hence 

hn V fk gn— k 

nT “ 2 — k[ (n-k)! ’ 

k=0 1 ’ 


in other words, (h n /n!) is the ordinary convolution of (f n /n!) and (g n /n.!). 

Binomial convolutions occur frequently in applications. For example, we 
defined the Bernoulli numbers in (6.79) by the implicit recurrence 



[m = 0] , 


for all m ^ 0; 


this can be rewritten as a binomial convolution, if we substitute n for m + 1 
and add the term B n to both sides: 


y_ ^B k = B n + [n= 1 ] , 


for all n > 0. 


( 7 - 76 ) 


We can now relate this recurrence to power series (as promised in Chapter 6) 
by introducing the egf for Bernoulli numbers, B(z) = H n >0 ^>nZ n /nl. The 
left-hand side of (7.76) is the binomial convolution of (B n ) with the constant 
sequence ( 1 , 1 , 1 ,...); hence the egf of the left-hand side is B(z)e z . The egf 
of the right-hand side is XL n >o(^ n + [ n — 1 ])z n /TU = B(z) + z. Therefore we 
must have B(z) = z/(e z — 1 ); we have proved equation (6.81), which appears 
also in Table 352 as equation (7.44). 
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Now let’s look again at a sum that has been popping up frequently in 
this book, 

S m (n) = 0 m + 1 m + 2 m + • • ■ + (n — 1 ) m = ^ km • 

0^k<n 

This time we will try to analyze the problem with generating functions, in 
hopes that it will suddenly become simpler. We will consider n to be fixed 
and m variable; thus our goal is to understand the coefficients of the power 
series 


S(z) = S 0 (n) + S 1 (n)z+S 2 (n)z 2 H = S m (n)z m . 

m^O 


We know that the generating function for (1 , k, k 2 , . . . ) is 


1 

1 — kz 


^k m z m , 

m^O 


hence 

SM 


L L " = L 

m^O 0</k<n O^kcn 


by interchanging the order of summation. We can put this sum in closed 
form, 

„ 1/1 1 1 \ 

S(Z) “ z(,F^O + F^T + "' + z-i-n+lJ 

= -(H z -i -H z -i_ n ) ; (7.77) 

z 

but we know nothing about expanding such a closed form in powers of z. 

Exponential generating functions come to the rescue. The egf of our 
sequence (So(n), Si (n), S 2 (tl), . . . ) is 

^ Z z m 

S(z, n) = S 0 (n) + Si (n) — + S 2 (n) — + • • • = Y_ s m(n) — . 

m^O 


To get these coefficients S m (n) we can use the egf for ( 1 , k, k 2 ,...), namely 


.kz 


z — m! 


m^O 


and we have 

S(z,n) = £ Z = L 

m^O 0</k<n O^kcn 
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And the latter sum is a geometric progression, so there’s a closed form 


S(z,n) 


e nz — 1 
e z — 1 


(7-78) 


Eureka! All we need to do is figure out the coefficients of this relatively simple 
function, and we’ll know S m (n), because S m (n) = m! [z m ]S(z, n). 

Here’s where Bernoulli numbers come into the picture. We observed a 
moment ago that the egf for Bernoulli numbers is 


Bfzl 


V R — = Z ■ 
2 - k k! e z — 1 ’ 


k>0 


hence we can write 


S(z,n) = B(z) 

( 


z° z 1 z 2 

= < B °0! + B 'TT + Bi 2! 


X 


Tl — +n Z — + TV 3 — 


1! 2! 3! 

The sum S m (n) is m! times the coefficient of z m in this product. For example, 


S 0 (tx) 

= 0! 1 





= n; 



TTNT 

v 1!0! 

) 




Si (n) 

= 1!| 

f B 0 UAi 

+ B! 

n 

) 

= In 2 — Bn; 



V 2! 0! 

1 ! 1 ! 

J 

2 2 ’ 

S 2 (n) 

= 2! 1 

n 3 

+ B! 

n 2 



( B °3!0! 

2! 1 ! 

+ B2 TT2l) 

= l n 3 -j n2 + l n 


We have therefore derived the formula D n = Sifn.) = j-n(n— ^(n— 1) for 
the umpteenth time, and this was the simplest derivation of all: In a few lines 
we have found the general behavior of S m (n) for all m. 

The general formula can be written 

Sm-i(n) = — (B m (n) — B m (0)) , (7.79) 

where B m (x) is the Bernoulli polynomial defined by 

BmW = X (^) BkXm_k ' (7-8o) 

k k ' 

Here’s why: The Bernoulli polynomial is the binomial convolution of the 
sequence (Bo, Bi , B2, . . . ) with (1 , x, x 2 , . . . ); hence the exponential generating 
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function for (Bq(x), Bi (x), B2M, . . . ) is the product of their egf’s, 

ze xz 


B(z,x) = Y B t 


m^O 


m! 


L 


e z — 1 z — m! 

m^O 


e z — 1 


(7.81) 


Equation (7.79) follows because the egf for ( 0 , So(tl), 2 S i (n), . . . ) is, by (7.78), 

e nz — 1 


e z — 1 


= B(z,n)-B(z,0). 


Let’s turn now to another problem for which egf’s are just the thing: 
How many spanning trees are possible in the complete graph on n vertices 
{ 1 , 2 ,..., n}? Let’s call this number t n . The complete graph has ^n(n — 1 ) 
edges, one edge joining each pair of distinct vertices; so we’re essentially 
looking for the total number of ways to connect up n given things by drawing 
n — 1 lines between them. 

We have ti = t2 = 1 . Also t3 = 3 , because a complete graph on three 
vertices is a fan of order 2 ; we know that f2 = 3 . And there are sixteen 
spanning trees when n = 4 : 


kxs nun nxx 
mu ux ux x (7.82) 


Hence t4 = 16 . 

Our experience with the analogous problem for fans suggests that the best 
way to tackle this problem is to single out one vertex, and to look at the blocks 
or components that the spanning tree joins together when we ignore all edges 
that touch the special vertex. If the non-special vertices form m components 
of sizes kj , k2, . . . , k m , then we can connect them to the special vertex in 
ki k2 . . . k m ways. For example, in the case n = 4 , we can consider the lower 
left vertex to be special. The top row of (7.82) shows 3t3 cases where the other 
three vertices are joined among themselves in t3 ways and then connected to 
the lower left in 3 ways. The bottom row shows 2- 1 x t2ti x Q) solutions where 
the other three vertices are divided into components of sizes 2 and 1 in ({) 
ways; there’s also the case where the other three vertices are completely 
unconnected among themselves. 

This line of reasoning leads to the recurrence 


= L 


m>0 


ki H bk m =n— 1 


n-1 

ki,k 2 ,...,k n 


ki k2 . . . k m tk, tk 2 • • ■ tk„ 
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for all n > 1 . Here’s why: There are ( k k ) ways to assign n — 1 elements 

to a sequence of m components of respective sizes ki , k 2 , . . . , k m ; there are 
tki t]c 2 . . ■ tk m ways to connect up those individual components with spanning 
trees; there are kj k2 . . . k m ways to connect vertex n to those components; and 
we divide by m! because we want to disregard the order of the components. 
For example, when n = 4 the recurrence says that 

t 4 = 3 t 3 + 2 (( 1 3 2 ) 2 tit 2 + ( 2 3 1 ) 2 t 2 ti) + ^((, ^j)t|) = 3 t 3 + 6t 2 ti + tf . 

The recurrence for t n looks formidable at first, possibly even frightening; 
but it really isn’t bad, only convoluted. We can define 


u n = n t n 


and then everything simplifies considerably: 


u n 

n! 


Li L 

m>0 ki+k 2 H |-k m =n— 1 


U^Uk, Uk^ 

ki! k 2 ! k m ! ’ 


if n > 1 . (7.83) 


The inner sum is the coefficient of z n_1 in the egf U(z), raised to the mth 
power; and we obtain the correct formula also when n = 1 , if we add in the 
term U(z)° that corresponds to the case m = 0 . So 

— = [ z n-i] y iL Q(z) m = [z n -']e Q(z) = [z n ]ze Q(z) 

n! L — m! 

m^O 


for all n > 0, and we have the equation 

U(z) = ze u(z) . (7-84) 

Progress! Equation (7.84) is almost like 

£(z) = e z£(z) , 

which defines the generalized exponential series £(z) = £i(z) in (5.59) and 
(7.71); indeed, we have 

U(z) = z£(z) . 

So we can read off the answer to our problem: 

tn = ^ = ^[z n ]U(z) = (H-imz-Vtz) = n- 2 . (7.85) 

The complete graph on { 1 , 2 , . . . , n} has exactly n n ~ 2 spanning trees, for all 
n > 0. 
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7.7 DIRICHLET GENERATING FUNCTIONS 

There are many other possible ways to generate a sequence from a 
series; any system of “kernel” functions K n (z) such that 

^g n K n (z) = 0 =*> gn. = 0 for all n 

n 


can be used, at least in principle. Ordinary generating functions use K n (z) = 
z n , and exponential generating functions use K n (z) = z n /n! ; we could also 
try falling factorial powers z— , or binomial coefficients z— /n! = ( z ). 

The most important alternative to gf’s and egf ’s uses the kernel functions 
1 /n z ; it is intended for sequences (gi , g 2 , . . . } that begin with n = 1 instead 
of n = 0 : 

G(z) = l|. (7-86) 

n^1 ' 

This is called a Dirichlet generating function (dgf), because the German 
mathematician Gustav Lejeune Dirichlet (1805-1859) made much of it. 

For example, the dgf of the constant sequence (1,1,1,...) is 

L ^ ^ • (7-87) 

n(> 1 


This is Riemann’s zeta function, which we have also called the generalized 

(z) 

harmonic number Hoc when z > 1 . 

The product of Dirichlet generating functions corresponds to a special 
kind of convolution: 


F(z)G(z) 


L 


9m 
l z m z 


L 


i 

n z 


y fi g m [l-m = n] . 

l,m^ 1 


Thus F(z)G(z) = H(z) is the dgf of the sequence 

tin = y^gn/d- (7-88) 

d\n 

For example, we know from ( 4 . 55 ) that 21d\n M-(d-) = [n=1]; this is 
the Dirichlet convolution of the Mobius sequence (p(1 ), p(2), p(3), . . . ) with 
(1,1,1 ,...), hence 

M(zK(z) = V ^ = 1. (7-89) 

z — n z 

n ^1 


In other words, the dgf of (p(l ), p(2), p(3), . . . ) is C(z) 1 . 
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Dirichlet generating functions are particularly valuable when the se- 
quence (gi , g 2 , . . . ) is a multiplicative function, namely when 

9mn = 9m gn for m _L H. 


In such cases the values of g n for all n are determined by the values of g n when 
n is a power of a prime, and we can factor the dgf into a product over primes: 


G(z) 


n ' 


p prime 


9p + 


g P ^ 


,2z 


+ 


9 P 3 

r»3z 


(7-90) 


If, for instance, we set g n = 1 for all n, we obtain a product representation 
of Riemann’s zeta function: 

^ = n (piW) • (7 - 9l) 

p prime ' / 


The Mobius function has p(p) = —1 and p(p k ) = 0 for k > 1, hence its dgf is 


M(z) = Yl (l ~v z ) ; 

p prime 


(7-92) 


this agrees, of course, with (7.89) and (7.91). Euler’s cp function has cp(p k ) = 
p k _ p k ~i , hence its dgf has the factored form 


d(z) 


- n (' 

p prime 



n 

p prime 


1 -p~ z 

1 — p 1 ~ z 


(7-93) 


We conclude that ®(z) = C(z— 1)/C(z). 


Exercises 

Warmups 

1 An eccentric collector of 2 x n domino tilings pays $4 for each vertical 
domino and $1 for each horizontal domino. How many tilings are worth 
exactly $m by this criterion? For example, when m = 6 there are three 
solutions: [B, ED, and 1 1 1 1 . 

2 Give the generating function and the exponential generating function for 
the sequence (2, 5, 13, 35, . . . ) = (2 n + 3 n ) in closed form. 

3 What is H n /10 n ? 

4 The general expansion theorem for rational functions P(z)/Q(z) is not 
completely general, because it restricts the degree of P to be less than 
the degree of Q. What happens if P has a larger degree than this? 
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5 Find a generating function S(z) such that 



Basics 

6 Show that the recurrence (7.32) can be solved by the repertoire method, 
without using generating functions. 

7 Solve the recurrence 


I deduce that Clark 
Kent is really 
Superman. 


90 = 1 ; 

9n = g n -i + 2g n _2 H hngo, for n > 0. 

8 What is [z n ] (ln(l -z)) 2 /(1 -z) m+1 ? 

9 Use the result of the previous exercise to evaluate H^Hn-k- 

10 Set r = s = — 1/2 in identity (7.62) and then remove all occurrences of 
1 /2 by using tricks like (5.36). What amazing identity do you deduce? 

11 This problem, whose three parts are independent, gives practice in the 
manipulation of generating functions. We assume that A(z) = ^l n a n z n , 
B(z) = {T n b n z n , C(z) = c nZ n , and that the coefficients are zero for 
negative n. 

a If c n = Hj+2k<n a jb]<, express C in terms of A and B. 
b If ub n = 2Ik=o 2 k a.k/(n. — lc) ! , express A in terms of B. 
c If r is a real number and if a n = jTk=o ( r Z k )bn-k, express A in 
terms of B; then use your formula to find coefficients fk(r) such that 
bn — 0 ^k (k) CL n — k- 

12 How many ways are there to put the numbers {1,2,..., 2n} into a 2 x n 
array so that rows and columns are in increasing order from left to right 
and from top to bottom? For example, one solution when n = 5 is 


/ 1 2 4 5 8 \ 
^3 6 7 9 10 J ' 


13 Prove Raney’s generalized lemma, which is stated just before (7.70). 

14 Solve the recurrence 


9o 

9n 


0, gi = l , 

-2ng n _! + Y_ Q) 9k9n-k , 


for n > 1 , 


by using an exponential generating function. 
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15 The Bell number £D n is the number of ways to partition n things into 
subsets. For example, CD 3 = 5 because we can partition {1,2,3} in the 
following ways: 

{1,2,3}; {1,2} U {3} ; {1 , 3} U {2} ; {1}U{2,3}; {1}U{2}U{3}. 

Prove that CD n+ i = (k)® n - k > and use this recurrence to find a closed 

form for the exponential generating function P(z) = CD n z n /TU- 

16 Two sequences (a n ) and (b n ) are related by the convolution formula 

f a n +k n 1 
1 kn 

also ao = 0 and bo = 1 . Prove that the corresponding generating func- 
tions satisfy lnB(z) = A(z) + |A(z 2 ) + ^A(z 3 ) + ■ ■ ■ . 

17 Show that the exponential generating function G(z) of a sequence is re- 
lated to the ordinary generating function G(z) by the formula 

'OO 

G(zt)e _t dt = G(z), 

. 0 

if the integral exists. 

18 Find the Dirichlet generating functions for the sequences 

a 9n = v / a; 
b g n =lnn; 
c g n = [n is squarefree]. 

Express your answers in terms of the zeta function. (Squarefreeness is 
defined in exercise 4.13.) 

19 Every power series F(z) = 2In>o with fo = 1 defines a sequence of 
polynomials f n (x) by the rule 

F(z) x = Y_ f nMz n , 
n^O 

where f n (1) = fn and f n (0) = [n = 0]. In general, f n (x) has degree n. 
Show that such polynomials always satisfy the convolution formulas 

rt 

fk(x)f n -k(y) = fn(x + -y); 

k=0 

rt 

(x + y) ^ kf k (x)f n _ k (y) = xnf rL (x + y) . 
k =0 

(The identities in Tables 202 and 272 are special cases of this trick.) 


What do you mean, 
“in general”? If 
f 1 = 1 2 = • • ■ = 
f th — 1 — 0, the 
degree of f n (x) is 
at most [n/mj . 
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Will he settle for 
2 x n domino 
tilings? 


At union rates, as 
many as you can 
afford, plus a few. 


20 A power series G(z) is called differentiably finite if there exist finitely 
many polynomials Pq(z), . . . , P m (z), not all zero, such that 

Po( z )G(z) + Pi (z)G'(z) + • • • + P m (z)G lm '(z) = 0. 

A sequence of numbers (go, gi , g 2 , • ■ • ) is called polynomially recursive 
if there exist finitely many polynomials po(z), . . p m (z), not all zero, 
such that 

Po(n)g n + pi (n)g n+ i H bp m (n.)g n +m = 0 

for all integers n f 0. Prove that a generating function is differentiably 
finite if and only if its sequence of coefficients is polynomially recursive. 

Homework exercises 

21 A robber holds up a bank and demands $500 in tens and twenties. He 
also demands to know the number of ways in which the cashier can give 
him the money. Find a generating function G(z) for which this number 
is [z 500 ] G(z), and a more compact generating function G(z) for which 
this number is [z 50 ] G(z). Determine the required number of ways by 
(a) using partial fractions; (b) using a method like ( 7 . 39 ). 

22 Let P be the sum of all ways to “triangulate” polygons: 



(The first term represents a degenerate polygon with only two vertices; 
every other term shows a polygon that has been divided into triangles. 
For example, a pentagon can be triangulated in five ways.) Define a 
“multiplication” operation AAB on triangulated polygons A and B so 
that the equation 

P = _ + PAP 

is valid. Then replace each triangle by ‘z’; what does this tell you about 
the number of ways to decompose an n-gon into triangles? 

23 In how many ways can a 2 x 2 x n pillar be built out of 2 x 1 x 1 bricks? 

24 How many spanning trees are in an n-wheel (a graph with n “outer” 
vertices in a cycle, each connected to an (n + 1)st “hub” vertex), when 
n A 3? 
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25 Let m ^ 2 be an integer. What is a closed form for the generating 

function of the sequence (n mod m), as a function of z and m? Use 
this generating function to express ‘n mod m’ in terms of the complex 
number eu = (For example, when m = 2 we have cu = — 1 and 

n mod 2 = j — j (— 1 ) n .) 

26 The second-order Fibonacci numbers (3n) are defined by the recurrence 

S’o = 0 ; 5i = 1 ; 

3n = 3n— 1 T 3n— 2 T h n t f°r Tl > 1 . 

Express in terms of the usual Fibonacci numbers F n and F n+ i . 

27 A 2 x n domino tiling can also be regarded as a way to draw n disjoint 
lines in a 2 x n array of points: 

I “ “ I “ 1 1 

If we superimpose two such patterns, we get a set of cycles, since ev- 
ery point is touched by two lines. For example, if the lines above are 
combined with the lines 


the result is 

0 rrm : : n . 

The same set of cycles is also obtained by combining 

1 1 nn nn nn 1 1 with i m m i m m . 

But we get a unique way to reconstruct the original patterns from the 
superimposed ones if we assign orientations to the vertical lines by using 
arrows that go alternately up/down/up/down/- • • in the first pattern and 
alternately down /up /down/ up/- • • in the second. For example, 

1 n rr i hh } i + i ? : = § rrm ° a 

The number of such oriented cycle patterns must therefore be = F^ +1 , 
and we should be able to prove this via algebra. Let Q n be the number 
of oriented 2 x n cycle patterns. Find a recurrence for Q n , solve it with 
generating functions, and deduce algebraically that Q n = Fn + U 

28 The coefficients of A(z) in ( 7 . 39 ) satisfy A r +A r+ io+A r+ 2 o+A r+ 3 o = 100 
for 0 r < 10. Find a “simple” explanation for this. 
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29 What is the sum of Fibonacci products 

L L F k ,F k2 ...F km ? 

m>0 k]+k 2 H hk m =n 

ki >k 2 , • • . >k rrL >0 

30 If the generating function G(z) = 1/(1 — az)(1 — |3z) has the partial 
fraction decomposition a/(1 — ocz)+b/(1 — (3z), what is the partial fraction 
decomposition of G(z) n ? 

31 What function g(n) of the positive integer n satisfies the recurrence 

y g(d) <p(u/d) = 1 , 

d\n 


where cp is Euler’s totient function? 

32 An arithmetic progression is an infinite set of integers 

{an + b} = {b, a + b, 2a + b, 3a + b, . . .} . 

A set of arithmetic progressions {ai n + bi }, . . . , {a m n + b m } is called an 
exact cover if every nonnegative integer occurs in one and only one of the 
progressions. For example, the three progressions {2n}, {4n+ 1}, {4n + 3} 
constitute an exact cover. Show that if {ain + b i }, . . . , {a m n + b m } is 
an exact cover such that 2 ai ^ ^ a m , then a m _i = a m . Hint: 

Use generating functions. 

Exam problems 

33 What is [w m z n ] (ln(1 +z))/(1 -wz)? 

34 Find a closed form for the generating function X!n>o G n (z)w n , if 

Gn(z) = 

k^rt/m 


n — mk 
k 


.mk 


(Here m is a fixed positive integer.) 

35 Evaluate the sum XLo<k<n 1 /k(n — k) in two ways: 
a Expand the summand in partial fractions. 

b Treat the sum as a convolution and use generating functions. 

36 Let A(z) be the generating function for (do, ai , a. 2 , Q 3 , . . . ). Express 

a.[ n / m jz n in terms of A, z, and m. 
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37 Let a n be the number of ways to write the positive integer n as a sum of 
powers of 2, disregarding order. For example, 0.4 = 4, since 4 = 2 + 2 = 
2+1+1 =1+1+1+1. By convention we let do = 1 . Let b n = 22k=o Q k 
be the cumulative sum of the first a’s. 

a Make a table of the a’s and b’s up through n = 10. What amazing 
relation do you observe in your table? (Don’t prove it yet.) 
b Express the generating function A(z) as an infinite product, 
c Use the expression from part (b) to prove the result of part (a). 

38 Find a closed form for the double generating function 

M(w,z) = ^ min(m,n) w m z n . 

m,n^0 

Generalize your answer to obtain, for fixed m 2, a closed form for 

M(z 1 ,...,z m ) = Y_ min(n 1 ,...,n m )z 7 '...z^ m . 
n-i n m ^0 


39 Given positive integers m and n, find closed forms for 

and L Tc-i k 2 ... K 


^ k-i R2 • • • k m 

1 ^ki <k 2 <-”<k m ^n 


1 ^ki <; k 2 ^ ^ k m <C it 


(For example, when m = 2 and n = 3 the sums are 1 • 2 + 1 • 3 + 2 • 3 and 
1 -1 +1 -2+1 -3+2-2+2-3+3-3.) Hint: What are the coefficients of z m in the 
generating functions ( 1 + ai z) . . . ( 1 + a n z) and 1 /( 1 — a-) z) . . . ( 1 — a n z)? 

40 Express ^ k (k)( kF n -F k )(n-k)j in closed form. 

41 An up-down permutation of order n is an arrangement ai a 2 . . . a n of 
the integers { 1 , 2 , . . . , n} that goes alternately up and down: 


ai < a2 > a3 < a4 > • • • . 


For example, 35142 is an up-down permutation of order 5. If A n de- 
notes the number of up-down permutations of order n, show that the 
exponential generating function of (A n ) is (1 + sin z) /cos z. 

42 A space probe has discovered that organic material on Mars has DNA 
composed of five symbols, denoted by (a, b,c, d, e), instead of the four 
components in earthling DNA. The four pairs cd, ce, ed, and ee never 
occur consecutively in a string of Martian DNA, but any string with- 
out forbidden pairs is possible. (Thus bbcda is forbidden but bbdca is 
OK.) How many Martian DNA strings of length n are possible? (When 
n = 2 the answer is 21 , because the left and right ends of a string are 
distinguishable. ) 
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43 The Newtonian generating function of a sequence (g n ) is defined to be 


G(z) 



Find a convolution formula that defines the relation between sequences 
(fn)) (9n)) and (h. n ) whose Newtonian generating functions are related 
by the equation F(z)G(z) = H(z). Try to make your formula as simple 
and symmetric as possible. 

44 Let q n be the number of possible outcomes when u numbers {xi , . . . , x n } 
are compared with each other. For example, q3 = 13 because the possi- 
bilities are 


x, < x 2 < x 3 ; 
X) = x 2 =x 3 ; 
x 2 <xt — x 3 ; 
x 3 <xt <x 2 ; 


xi <x 2 =x 3 ; 
xi — x 3 < x 2 ; 

*2 < X 3 < X! ; 

x 3 <x, =x 2 ; 


x, < x 3 < x 2 ; 
x 2 < X! < x 3 ; 
x 2 =x 3 <xt ; 
x 3 < x 2 < X! . 


x, =x 2 <x 3 ; 


Find a closed form for the egf Q(z) = ^ q n z n /n\. Also find sequences 
(Qn), (bn), (Cn) such that 


qn = Y- knQk 

k^O 




for all n > 0. 


45 Evaluate ]T m rl>0 [m_Ln]/m 2 n 2 . 

46 Evaluate 



in closed form. Hint: z 3 — z 2 + jj = (z + \ ){z — |) 2 . 

47 Show that the numbers U n and V n of 3 x n domino tilings, as given in 
(7.34), are closely related to the fractions in the Stern-Brocot tree that 
converge to \/3. 

48 A certain sequence (g n ) satisfies the recurrence 


QQn + bg n+ i + cg n+2 + d = 0 , integer n ^ 0, 


for some integers (a, b, c, d) with gcd(a, b, c, d) = 1. It also has the closed 
form 


gn = [a(l + y/2 ) n J , integer n^O, 


for some real number a. between 0 and 1 . Find a, b, c, d, and a. 
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49 This is a problem about powers and parity. 

a Consider the sequence (do , aj , < 12 , . . . ) == (2,2,6,...) defined by the 
formula 

a n = (1 +V / 2) n + (l -V2) n . 


Find a simple recurrence relation that is satisfied by this sequence, 
b Prove that |~(1 + \/2) n ] = n (mod 2) for all integers n > 0. 

c Find a number a. of the form (p + y'q )/ 2, where p and q are positive 

integers, such that Lct n J = n (mod 2) for all integers n > 0. 

Bonus problems 

50 Continuing exercise 22, consider the sum of all ways to decompose poly- 
gons into polygons: 

Q = - + A + 

Find a symbolic equation for Q and use it to find a generating function 
for the number of ways to draw nonintersecting diagonals inside a convex 
n-gon. (Give a closed form for the generating function as a function of z; 
you need not find a closed form for the coefficients.) 

51 Prove that the product 



2mn/2 


1 

1 Xk<n 


J7t 


m - 


y )n 2 + (c 


k7T 


n ■ 



1/4 


is the generating function for tilings of an mx n rectangle with dominoes. 
(There are mu factors, which we can imagine are written in the mn cells 
of the rectangle. If mn is odd, the middle factor is zero. The coefficient 
of Ti=] k is the number of ways to do the tiling with j vertical and k 
horizontal dominoes.) Hint: This is a difficult problem, really beyond 
the scope of this book. You may wish to simply verify the formula in the 
case m = 3, n = 4. 

52 Prove that the polynomials defined by the recurrence 

Pn(y) = (y- - L WIcKt) Pk(y)> integern5>0, 

k— 0 k k ' 


have the form p n (y) = Hm=o lm|y n > where |^| is a positive integer for 
1 ^ m ^ n. Hint: This exercise is very instructive but not very easy. 


Kissinger, take note. 


Is this a hint or a 
warning? 
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53 The sequence of pentagonal numbers (1,5,12,22,...) generalizes the 
triangular and square numbers in an obvious way: 



Let the nth triangular number be T n = n(n+l )/2; let the nth pentagonal 
number be P n = n(3n — 1)/2; and let U n be the 3 x n domino-tiling 
number defined in (7.38). Prove that the triangular number T (U4ti+2 _-| )/2 
is also a pentagonal number. Hint: 311^ = (V^n-i + Vin+i ) 2 + 2. 

54 Consider the following curious construction: 


1 

2 

3 

4 

5 6 7 

8 

9 

10 11 

12 

13 

14 

15 16 . 

1 

2 

3 

4 

6 7 

8 

9 

11 

12 

13 

14 

16 . 

1 

3 

6 

10 

16 23 

31 

40 

51 

63 

76 

90 

106 . 

1 

3 

6 


16 23 

31 


51 

63 

76 


106 . 

1 

4 

10 


26 49 

80 


131 

194 270 


376 . 

1 

4 



26 49 



131 

194 



376 . 

1 

5 



31 80 



211 

405 



781 . 

1 




31 



211 




781 . 

1 




32 



243 




1024 . 


(Start with a row containing all the positive integers. Then delete every 
mth column; here m = 5. Then replace the remaining entries by partial 
sums. Then delete every (m— l)st column. Then replace with partial 
sums again, and so on.) Use generating functions to show that the final 
result is the sequence of mth powers. For example, when m = 5 we get 
(1 5 ,2 5 ,3 5 ,4 5 , . . .) as shown. 

55 Prove that if the power series F(z) and G(z) are differentiably finite (as 
defined in exercise 20), then so are F(z) + G(z) and F(z)G(z). 

Research problems 

56 Prove that there is no “simple closed form” for the coefficient of z n in 
(1 + z + z 2 ) n , as a function of n, in some large class of “simple closed 
forms.” 

57 Prove or disprove: If all the coefficients of G(z) are either 0 or 1, and if 
all the coefficients of G(z) 2 are less than some constant M, then infinitely 
many of the coefficients of G(z) 2 are zero. 
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Discrete Probability 

THE ELEMENT OF CHANCE enters into many of our attempts to under- 
stand the world we live in. A mathematical theory of probability allows us 
to calculate the likelihood of complex events if we assume that the events are 
governed by appropriate axioms. This theory has significant applications in 
all branches of science, and it has strong connections with the techniques we 
have studied in previous chapters. 

Probabilities are called “discrete” if we can compute the probabilities of 
all events by summation instead of by integration. We are getting pretty good 
at sums, so it should come as no great surprise that we are ready to apply 
our knowledge to some interesting calculations of probabilities and averages. 


8.1 DEFINITIONS 


(Readers unfamiliar 
with probability 
theory will, with 
high probability, 
benefit from a 
perusal of Feller’s 
classic introduc- 
tion to the subject 
[ 120 ].) 


Probability theory starts with the idea of a probability space, which 
is a set £1 of all things that can happen in a given problem together with a 
rule that assigns a probability Pr(cu) to each elementary event tu £ Cl. The 
probability Pr(cu) must be a nonnegative real number, and the condition 

Y_ Pr(cu) = 1 (8.i) 

cue n 

must hold in every discrete probability space. Thus, each value Pr(cu) must lie 
in the interval [0 . . 1]. We speak of Pr as a probability distribution, because 
it distributes a total probability of 1 among the events cu. 

Here’s an example: If we’re rolling a pair of dice, the set £1 of elementary 
events is D 2 = { 00, • • • , OlOl }, where 


D = {0, □, 0, HT], |E, 0 


Never say die. is the set of all six ways that a given die can land. Two rolls such as p-HIT*] 

and [7^][T] are considered to be distinct; hence this probability space has a 
total of 6 2 = 36 elements. 


381 
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We usually assume that dice are “fair” — that each of the six possibilities 
for a particular die has probability i, and that each of the 36 possible rolls 
in O has probability 36 . But we can also consider “loaded” dice in which 
there is a different distribution of probabilities. For example, let 

PnO = p ri (0) = 1; 

Pr.lQ) = Pr,(0) = PrilUl) = Prip) = 

Then UdeD Pr i (d) = 1, so Prj is a probability distribution on the set D, 
and we can assign probabilities to the elements of Q. = D 2 by the rule 

Prn(dd') = Pn(d) Pr^d'). ( 8 . 2 ) 

For example, Pri 1 ([Tj][y]) = \ • ^ = 32 • This is a valid distribution because 


X Pr n(cu)= Y. PDi(dd') = Y Pn(d) Pri(d') 

U) GO dd'GD 2 d.d'GD 

= Y Pr ’( d ) L Pri ( d ') = 1 - 1 = 1 • 

deD d'GD 

We can also consider the case of one fair die and one loaded die, 

Proi(dd') = Pr 0 (d) Pr] fd') , where Pr 0 (d) = ( 8 . 3 ) 

in which case Pro 1 ( [FT] E3 ) = I ' I = ?s- Dice in the “real world” can’t 
really be expected to turn up equally often on each side, because they aren’t 
perfectly symmetrical; but i is usually pretty close to the truth. 

An event is a subset of Q. In dice games, for example, the set 


00 - 00 - 00 - 00 ) 


is the event that “doubles are thrown.” The individual elements eu of Cl are 
called elementary events because they cannot be decomposed into smaller 
subsets; we can think of eu as a one-element event {cu}. 

The probability of an event A is defined by the formula 

Pr(tueA) = z Pr(co); ( 8 . 4 ) 

ojGA 

and in general if R(tu) is any statement about cu, we write ‘Pr(R(tu))’ for the 
sum of all Pr(eu) such that R(cu) is true. Thus, for example, the probability of 
doubles with fair dice is jg + jg + jg + yg + jg + jg = g! but when both dice are 
loaded with probability distribution Pri it is Tg + gj+gg+gg + gg + gg = fg > 
l. Loading the dice makes the event “doubles are thrown” more probable. 


Careful: They 
might go off. 


If all sides of a cube 
were identical, how 
could we tell which 
side is face up? 
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(We have been using ^-notation in a more general sense here than de- 
fined in Chapter 2: The sums in (8.i) and (8.4) occur over all elements cu 
of an arbitrary set, not over integers only. However, this new development is 
not really alarming; we can agree to use special notation under a whenever 
nonintegers are intended, so there will be no confusion with our ordinary con- 
ventions. The other definitions in Chapter 2 are still valid; in particular, the 
definition of infinite sums in that chapter gives the appropriate interpretation 
to our sums when the set £1 is infinite. Each probability is nonnegative, and 
the sum of all probabilities is bounded, so the probability of event A in (8.4) 
is well defined for all subsets ACD.) 

A random variable is a function defined on the elementary events cu of a 
probability space. For example, if O = D 2 we can define S(cu) to be the sum 
of the spots on the dice roll cu, so that S([T7][7{]) = 6 + 3 = 9. The probability 
that the spots total seven is the probability of the event S(cu) = 7, namely 

Pr ( □ O ) + Pr ( □ [X] ) + Pr ( 0 [[]] ) 

+ Pf|O0)+Pr(^S)+Pr([O]H). 

With fair dice (Pr = Proo), this happens with probability h; but with loaded 
dice (Pr = Pr, , ), it happens with probability Tg + 54 + 24+6? + 55 + yg = jg, 
the same as we observed for doubles. 

It’s customary to drop the ‘(cu)’ when we talk about random variables, 
because there’s usually only one probability space involved when we’re work- 
ing on any particular problem. Thus we say simply ‘S = 7’ for the event that 
a 7 was rolled, and ‘S = 4’ for the event { f-~|[Xl , [7*117*1 , [7**1 }• 

A random variable can be characterized by the probability distribution of 
its values. Thus, for example, S takes on eleven possible values {2,3, ... , 12}, 
and we can tabulate the probability that S = s for each s in this set: 


s 

2 3 

4 

5 6 7 8 9 10 11 12 

Pr 00 (S = s) 

1 2 

36 36 

3 

36 

4 5 6 5 4 3 2 1 

36 36 36 36 36 36 36 36 

Prn(S = s) 

4 4 

64 64 

5 

64 

67 12 76544 

64 64 64 64 64 64 64 64 

If we’re working on a problem that involves only the random variable S and no 
other properties of dice, we can compute the answer from these probabilities 


alone, without regard to the details of the set Cl = D 2 . In fact, we could 
define the probability space to be the smaller set £1 = {2, 3, . . . , 12}, with 
whatever probability distribution Pr(s) is desired. Then ‘S = 4’ would be 
an elementary event. Thus we can often ignore the underlying probability 
space £1 and work directly with random variables and their distributions. 

If two random variables X and Y are defined over the same probabil- 
ity space O, we can characterize their behavior without knowing everything 
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about Q if we know the “joint distribution” 

Pr(X = x and Y = y ) 

for each x in the range of X and each p in the range of Y. We say that X and 
Y are independent random variables if 

Pr(X = x and Y = y ) = Pr(X =x) • Pr(Y = q ) (8-5) 

for all x and y. Intuitively, this means that the value of X has no effect on 
the value of Y. 

For example, if £1 is the set of dice rolls D 2 , we can let Si be the number 
of spots on the first die and S2 the number of spots on the second. Then 
the random variables Si and S2 are independent with respect to each of the 
probability distributions Proo, Prn, and Proi discussed earlier, because we 
defined the dice probability for each elementary event dd' as a product of a 
probability for Si = d multiplied by a probability for S2 = d'. We could have 
defined probabilities differently so that, say, 

Pr(H^])/Pr(HO) ^ Pr(S^)/Pr(HO); 

but we didn’t do that, because different dice aren’t supposed to influence each 
other. With our definitions, both of these ratios are Pr(S2 =5)/Pr(S2 = 6). 

We have defined S to be the sum of the two spot values, Si + S2. Let’s 
consider another random variable P, the product Si S2. Are S and P indepen- 
dent? Informally, no; if we are told that S = 2, we know that P must be 1 . 
Formally, no again, because the independence condition (8.5) fails spectacu- 
larly (at least in the case of fair dice): For all legal values of s and p, we have 
0 < Pr 0 o(S = s) -Proo(P=p) ^ \ ■ l', this can’t equal Pr 0 o(S = s and P =p), 
which is a multiple of ^ . 

If we want to understand the typical behavior of a given random vari- 
able, we often ask about its “average” value. But the notion of “average” 
is ambiguous; people generally speak about three different kinds of averages 
when a sequence of numbers is given: 

• the mean (which is the sum of all values, divided by the number of 
values); 

• the median (which is the middle value, numerically); 

• the mode (which is the value that occurs most often). 

For example, the mean of (3, 1 ,4, 1 ,5) is 3+1 +^ +1 +5 _ 2.8; the median is 3; 
the mode is 1 . 

But probability theorists usually work with random variables instead of 
with sequences of numbers, so we want to define the notion of an “average” for 
random variables too. Suppose we repeat an experiment over and over again, 


Just Say No. 


A dicey inequality. 
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making independent trials in such a way that each value of X occurs with 
a frequency approximately proportional to its probability. (For example, we 
might roll a pair of dice many times, observing the values of S and/or P.) We’d 
like to define the average value of a random variable so that such experiments 
will usually produce a sequence of numbers whose mean, median, or mode is 
approximately the same as the mean, median, or mode of X, according to our 
definitions. 

Here’s how it can be done: The mean of a random real- valued variable X 
on a probability space Q is defined to be 

x-Pr(X=x) ( 8 . 6 ) 

xex(n) 

if this potentially infinite sum exists. (Here X(£2) stands for the set of all 
values that X can assume.) The median of X is defined to be the set of all x 
such that 

Pr(X^x) i> j and Pr(X^x) ^ j. ( 8 . 7 ) 

And the mode of X is defined to be the set of all x such that 

Pr(X=x) ^ Pr(X = x') for all x' € X(O). ( 8 . 8 ) 

In our dice-throwing example, the mean of S turns out to be 2 • + 3 • 

+ • • • + 12 • jg —7 in distribution Proo, and it also turns out to be 7 in 
distribution Pr^. The median and mode both turn out to be {7} as well, 
in both distributions. So S has the same average under all three definitions. 
On the other hand the P in distribution Proo turns out to have a mean value 
of /9 _ 12.25; its median is {10}, and its mode is { 6 , 12}. The mean of P is 
unchanged if we load the dice with distribution Pri 1 , but the median drops 
to { 8 } and the mode becomes { 6 } alone. 

Probability theorists have a special name and notation for the mean of a 
random variable: They call it the expected value, and write 

EX = ^ X(tu)Pr(cu). ( 8 . 9 ) 

o>en 

In our dice-throwing example, this sum has 36 terms (one for each element 
of D), while ( 8 . 6 ) is a sum of only eleven terms. But both sums have the 
same value, because they’re both equal to 

Y_ xPr(cu) [x = X(cu)j . 

cuGQ 

xex(O) 
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I get it: 

On average, “aver- 
age” means “mean.” 


E(X + Y) = Y_ (X(tu)+Y(tu))Pr(tu) = EX + EY. (8.10) 

ujGQ 

Similarly, if a is any constant we have the simple rule 

E(ocX) = aEX. (8.11) 

But the corresponding rule for multiplication of random variables is more 
complicated in general; the expected value is defined as a sum over elementary 
events, and sums of products don’t often have a simple form. In spite of this 
difficulty, there is a very nice formula for the mean of a product in the special 
case that the random variables are independent: 

E(XY) = (EX)(EY), if X and Y are independent. (8.12) 

We can prove this by the distributive law for products, 

EfXY) = X(cu)Y(cu)-Pr(u>) 

ojGO 

= Y_ xy • Pr ( X = x and Y = y ) 

xex(Q) 
y GY(QJ 

= Y xy -Pr(X = x) Pr(Y = y) 

XGX(Q) 

yeY(O) 

= Y * p r(X = x) • Y iJ Pr ( y = y) = ( p X)(EY). 

xeX(O) yeY(O) 

For example, we know that S = Si + S2 and P = Si S2, when Si and S2 are 
the numbers of spots on the first and second of a pair of random dice. We have 
ESi = ES2 = j, hence ES = 7; furthermore Si and S2 are independent, so 
EP = j- j = p, as claimed earlier. We also have E(S + P) = ES + EP = 7 + p. 
But S and P are not independent, so we cannot assert that E(SP) = 7 • p = 
In fact, the expected value of SP turns out to equal pp in distribution 
Proo, while it equals 1 12 (exactly) in distribution Pri 1 . 


The mean of a random variable turns out to be more meaningful m 
applications than the other kinds of averages, so we shall largely forget about 
medians and modes from now on. We will use the terms “expected value,” 
“mean,” and “average” almost interchangeably in the rest of this chapter. 

If X and Y are any two random variables defined on the same probability 
space, then X + Y is also a random variable on that space. By formula (8.9), 
the average of their sum is the sum of their averages: 
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(Slightly subtle 
point: 

There are two 
probability spaces, 
depending on what 
strategy we use; but 
EXi and EX 2 are 
the same in both.) 


8.2 MEAN AND VARIANCE 

The next most important property of a random variable, after we 
know its expected value, is its variance, defined as the mean square deviation 
from the mean: 

VX = E((X-EX) 2 ). ( 8 . 13 ) 

If we denote EX by p, the variance VX is the expected value of (X — p) 2 . This 
measures the “spread” of X’s distribution. 

As a simple example of variance computation, let’s suppose we have just 
been made an offer we can’t refuse: Someone has given us two gift certificates 
for a certain lottery. The lottery organizers sell 100 tickets for each weekly 
drawing. One of these tickets is selected by a uniformly random process — 
that is, each ticket is equally likely to be chosen — and the lucky ticket holder 
wins a hundred million dollars. The other 99 ticket holders win nothing. 

We can use our gift in two ways: Either we buy two tickets in the same 
lottery, or we buy one ticket in each of two lotteries. Which is a better 
strategy? Let’s try to analyze this by letting X] and X 2 be random variables 
that represent the amount we win on our first and second ticket. The expected 
value of Xi , in millions, is 

EX 1 = tc^o+tm- 100 = 

and the same holds for Xj. Expected values are additive, so our average total 
winnings will be 

E(Xj +X 2 ) = EXi + EX 2 = 2 million dollars, 

regardless of which strategy we adopt. 

Still, the two strategies seem different. Let’s look beyond expected values 
and study the exact probability distribution of Xi + X 2 : 



winnings (millions) 

0 100 200 

same drawing 
different drawings 

.9800 .0200 

.9801 .0198 .0001 


If we buy two tickets in the same lottery we have a 98% chance of winning 
nothing and a 2% chance of winning $100 million. If we buy them in different 
lotteries we have a 98.01% chance of winning nothing, so this is slightly more 
likely than before; and we have a 0 . 01 % chance of winning $200 million, also 
slightly more likely than before; and our chances of winning $100 million are 
now 1.98%. So the distribution of Xi + X 2 in this second situation is slightly 
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more spread out; the middle value, $100 million, is slightly less likely, but the 
extreme values are slightly more likely. 

It’s this notion of the spread of a random variable that the variance is 
intended to capture. We measure the spread in terms of the squared deviation 
of the random variable from its mean. In case 1, the variance is therefore 

,98(0M-2M) 2 + ,02(100M-2M ) 2 = 196M 2 ; 
in case 2 it is 

.9801 (0M — 2M ) 2 + .0198(100M — 2M ) 2 + .0001 (200M — 2M ) 2 

= 198M 2 . 

As we expected, the latter variance is slightly larger, because the distribution 
of case 2 is slightly more spread out. 

When we work with variances, everything is squared, so the numbers can 
get pretty big. (The factor M 2 is one trillion, which is somewhat imposing 
even for high-stakes gamblers.) To convert the numbers back to the more 
meaningful original scale, we often take the square root of the variance. The 
resulting number is called the standard deviation, and it is usually denoted 
by the Greek letter c r: 

cr = VVX. (8.14) 

The standard deviations of the random variables X] + X 2 in our two lottery 
strategies are Vl96M 2 = 14.00M and V^98M. 2 « 1 4.071 247M. In some sense 
the second alternative is about $71,247 riskier. 

How does the variance help us choose a strategy? It’s not clear. The 
strategy with higher variance is a little riskier; but do we get the most for our 
money by taking more risks or by playing it safe? Suppose we had the chance 
to buy 100 tickets instead of only two. Then we could have a guaranteed 
victory in a single lottery (and the variance would be zero); or we could 
gamble on a hundred different lotteries, with a .99 1 00 ss .366 chance of winning 
nothing but also with a nonzero probability of winning up to $ 10 , 000 , 000 , 000 . 
To decide between these alternatives is beyond the scope of this book; all we 
can do here is explain how to do the calculations. 

In fact, there is a simpler way to calculate the variance, instead of using 
the definition ( 8 . 13 ). (We suspect that there must be something going on 
in the mathematics behind the scenes, because the variances in the lottery 
example magically came out to be integer multiples of M 2 .) We have 


Interesting: The 
variance of a dollar 
amount is expressed 
in units of square 
dollars. 


Another way to 
reduce risk might 
be to bribe the 
lottery officials. 

I guess that’s where 
probability becomes 
indiscreet. 

(N.B.: Opinions 
expressed in these 
margins do not 
necessarily represent 
the opinions of the 
management.) 


E((X — EX) 2 ) = E(X 2 -2X(EX) + (EX) 2 ) 

= E(X 2 ) — 2(EX)(EX) + (EX ) 2 
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since (EX) is a constant; hence 

VX = E(X 2 ) — (EX ) 2 . ( 8 . 15 ) 

“The variance is the mean of the square minus the square of the mean.” 

For example, the mean of (Xi +X 2) 2 comes to .98(0M) 2 + .02(100M ) 2 = 
200M 2 or to .9801 (OM ) 2 + ,0198(100M) 2 + .0001 (200M ) 2 = 202M 2 in the 
lottery problem. Subtracting 4M 2 (the square of the mean) gives the results 
we obtained the hard way. 

There’s an even easier formula yet, if we want to calculate V(X + Y) when 
X and Y are independent: We have 

E((X + Y) 2 ) = E(X 2 + 2XY + Y 2 ) 

= E(X 2 ) + 2(EX)(EY) + E(Y 2 ) , 

since we know that E(XY) = (EX)(EY) in the independent case. Therefore 

V(X + Y) = E((X + Y) 2 ) - (EX + EY ) 2 
= E(X 2 ) + 2(EX)(EY) + E(Y 2 ) 

- (EX ) 2 -2(EX)(EY) - (EY ) 2 
= E(X 2 ) - (EX ) 2 + E(Y 2 ) - (EY ) 2 

= VX + VY. ( 8 . 16 ) 

“The variance of a sum of independent random variables is the sum of their 
variances.” For example, the variance of the amount we can win with a single 
lottery ticket is 

E(X 2 ) — (EX] ) 2 = ,99(0M ) 2 + .01(100M ) 2 - (1M ) 2 = 99M 2 . 

Therefore the variance of the total winnings of two lottery tickets in two 
separate (independent) lotteries is 2x 99M 2 = 1 98M 2 . And the corresponding 
variance for n independent lottery tickets is n x 99M 2 . 

The variance of the dice-roll sum S drops out of this same formula, since 
S = Si + S 2 is the sum of two independent random variables. We have 

VSt = ^(l 2 +2 2 +3 2 +4 2 + 5 2 + 6 2 )- (Q 

when the dice are fair; hence VS = + ft = qy- The l° a< ied die has 

2 

VS! = l(2-l 2 t2 2 +3 2 +4 2 + 5 2 + 2-6 2 )- (Q = 
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hence VS = ^ — 7.5 when both dice are loaded. Notice that the loaded dice 
give S a larger variance, although S actually assumes its average value 7 more 
often than it would with fair dice. If our goal is to shoot lots of lucky 7’s, the 
variance is not our best indicator of success. 

OK, we have learned how to compute variances. But we haven’t really 
seen a good reason why the variance is a natural thing to compute. Everybody 
does it, but why? The main reason is Chebyshev’s inequality ([29] and [57]), 
which states that the variance has a significant property: 

Pr((X-EX) 2 ^oc) sj VX/ct, for all a > 0. (8.17) 

(This is different from the monotonic inequalities of Chebyshev that we en- 
countered in Chapter 2.) Very roughly, (8.17) tells us that a random variable X 
will rarely be far from its mean EX if its variance VX is small. The proof is 
amazingly simple. We have 

VX = Y_ (X(cu)-EX) 2 Pr(tu) 

aiEQ 

£ Y- (x(tu) - EX) 2 Pr(co) 

cuen 

(X(cu)-EX) 2 ^a 

Y_ otPr(cu) = a-Pr((X — EX) 2 ^ a) ; 

coen 

(X(a>)-EX) 2 ^a 

dividing by a finishes the proof. 

If we write p for the mean and a for the standard deviation, and if we 
replace oc by c 2 VX in (8.17), the condition (X — EX) 2 )> c 2 VX is the same as 
(X-p ) 2 )> (ccr) 2 ; hence (8.17) says that 

Pr(|X — p| ^ccr) ^ 1/c 2 . (8.18) 

Thus, X will lie within c standard deviations of its mean value except with 
probability at most 1/c 2 . A random variable will lie within 2cr of p at least 
75% of the time; it will lie between p — 1 0cr and p + 1 0cr at least 99% of the 
time. These are the cases a = 4VX and oc = 1 00VX of Chebyshev’s inequality. 

If we roll a pair of fair dice n times, the total value of the n rolls will 
almost always be near 7n, for large n. Here’s why: The variance of n in- 
dependent rolls is yy-n. A variance of means a standard deviation of 
only 



If be proved it in 
1867, it’s a classic 
’67 Chebyshev. 



8.2 MEAN AND VARIANCE 391 


So Chebyshev’s inequality tells us that the final sum will lie between 
7n-10y / ^n and 7n+ lO^fn 

in at least 99% of all experiments when n fair dice are rolled. For example, 
the odds are better than 99 to 1 that the total value of a million rolls will be 
between 6.976 million and 7.024 million. 

In general, let X be any random variable over a probability space £1, hav- 
ing finite mean p and finite standard deviation a. Then we can consider the 
probability space £l n whose elementary events are n-tuples (cu i , a> 2 , . . . , a> n ) 
with each cu k £ Cl, and whose probabilities are 

Pr(cui,cu 2 ,...,u> n ) = Pr(a>i ) Pr(a> 2 ) . . . Pr(a> n ) . 

If we now define random variables X k by the formula 

X k (o>i,u> 2 ,...,tu n ) = X(u>k), 

the quantity 

X, + X 2 + ■ • • + X n 


(That is, the aver- 
age will fail between 
the stated limits in 
at least 99% of all 
cases when we look 
at a set of n inde- 
pendent samples, 
for any fixed value 
of n . Don ’t mis- 
understand this as 
a statement about 
the averages of an 
infinite sequence 
Xi, X>, X 3 , ... 
as n varies.) 


is a sum of n independent random variables, which corresponds to taking n 
independent “samples” of X on Q and adding them together. The mean of 
Xi + X 2 + • • • + X n is np, and the standard deviation is ^/ucr; hence the 
average of the n samples, 

— (Xi + X 2 + ■ ■ ■ + X n ) , 

TL 

will lie between p— lOcr/^/n and p + 1 Oo/y/n at least 99% of the time. In 
other words, if we choose a large enough value of n, the average of n inde- 
pendent samples will almost always be very near the expected value EX. (An 
even stronger theorem called the Strong Law of Large Numbers is proved in 
textbooks of probability theory; but the simple consequence of Chebyshev’s 
inequality that we have just derived is enough for our purposes.) 

Sometimes we don’t know the characteristics of a probability space, and 
we want to estimate the mean of a random variable X by sampling its value 
repeatedly. (For example, we might want to know the average temperature 
at noon on a January day in San Francisco; or we may wish to know the 
mean life expectancy of insurance agents.) If we have obtained independent 
empirical observations Xj, X 2 , . . . , X n , we can guess that the true mean is 
approximately 


X] + X2 + • • • + X n 


EX = 


n 


(8.19) 
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And we can also make an estimate of the variance, using the formula 

vx = X 1 +xi + --- + x£ _ (X,+X 2 + ... + X n ) 2 (g20) 

n — 1 n(n — 1 J 

The (n— 1 )’s in this formula look like typographic errors; it seems they should 
be n’s, as in ( 8 . 19 ), because the true variance VX is defined by expected values 
in ( 8 . 15 ). Yet we get a better estimate with n— 1 instead of n here, because 
definition ( 8 . 20 ) implies that 

E(VX) = VX. ( 8 . 21 ) 

Here’s why: 

EIVX) = 

k=1 j = 1 k=l 

k=1 j = 1 k=1 

it n n 

= — ,(L E ( x2 ) - n L L( E w 2 ^ k] + E ( x2 ^' =k] )) 

k=1 j=l k=l 

= ^ (nE(X 2 ) - 1 (nE(X 2 ) + n(n - 1 )E(X) 2 )) 

= E(X 2 )-E(X ) 2 = VX. 

(This derivation uses the independence of the observations when it replaces 
E(XjXk) by (EX) 2 [j^k]+E(X 2 )[j=k]_) 

In practice, experimental results about a random variable X are usually 
obtained by calculating a sample mean p. = EX and a sample standard de- 
viation & = Vvx, and presenting the answer in the form ‘ p. ± For 

example, here are ten rolls of two supposedly fair dice: 



The sample mean of the spot sum S is 

p = (7+11 +8 + 5 + 4 + 6+10 + 8 + 8 + 7)/10 = 7.4; 
the sample variance is 

(7 2 + 11 2 + 8 2 +5 2 +4 2 + 6 2 + 10 2 + 8 2 + 8 2 + 7 2 -10p 2 )/9 « 2.1 2 . 
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Not to be confused 
with a Fibonacci 
number. 


One the average. 


We estimate the average spot sum of these dice to be 7.4±2.1 /\/T0 = 7.4±0.7, 
on the basis of these experiments. 

Let’s work one more example of means and variances, in order to show 
how they can be calculated theoretically instead of empirically. One of the 
questions we considered in Chapter 5 was the “football victory problem,” 
where n hats are thrown into the air and the result is a random permutation 
of hats. We showed in equation ( 5 . 51 ) that there’s a probability of nj/n! « 1 /e 
that nobody gets the right hat back. We also derived the formula 


P(n, k) 


1 

u! 



(n-k)j 


1 (n-k)j 
k! (n-k)! 


( 8 . 22 ) 


for the probability that exactly k people end up with their own hats. 

Restating these results in the formalism just learned, we can consider the 
probability space TT n of all n! permutations n of {1,2, . . . ,n}, where Pr( 7 t) = 
1/n! for all n £ 17 n . The random variable 


F n (7t) = number of “fixed points” of 7t, for 7t £ TT n , 

measures the number of correct hat-falls in the football victory problem. 
Equation ( 8 . 22 ) gives Pr(F n =k), but let’s pretend that we don’t know any 
such formula; we merely want to study the average value of F n , and its stan- 
dard deviation. 

The average value is, in fact, extremely easy to calculate, avoiding all the 
complexities of Chapter 5. We simply observe that 

FnM = F n , 1 (7T)+F ni 2 (7T) + --' + F n , n (7t), 

F = [position k of n is a fixed point] , for n £ n n . 

Hence 

EF n = EF n j + EF n ,2 H E FF nin . 

And the expected value of Fn.^ is simply the probability that F n jc = 1 , which 
is 1/n because exactly (n — 1 )! of the n! permutations n = 7ti 7t2 . . . 7t n £ f[ n 
have 711c = k. Therefore 

EF n = n/n = 1 , for n > 0. ( 8 . 23 ) 

On the average, one hat will be in its correct place. “A random permutation 
has one fixed point, on the average.” 

Now what’s the standard deviation? This question is more difficult, be- 
cause the F^k’s are not independent of each other. But we can calculate the 
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variance by analyzing the mutual dependencies among them: 


( rt 2 \ n n 

(L F <^) ) = e (L L f ^ f -a) 

k=1 ' V j=1 k=1 

rt n 

- ^^E( F nJ F n,k) = Y. E ( F n,k) + 2 L F ( F n,j U,k) • 

j = 1 k=l 1$k$n 1^j<k$n 

(We used a similar trick when we derived ( 2 . 33 ) in Chapter 2.) Now k = 
Fn k, since F,-,^ is either 0 or 1; hence E(F^ k ) = EF ni k = 1/n as before. And 
if j < k we have E(F n j F^^) = Pr( 7 t has both j and k as fixed points) — 
(n — 2)!/n! = 1/n(n — 1 ). Therefore 

EIF "> = l + (2) = 2 - M 

(As a check when n = 3, we have |0 2 + 1 1 2 + §2 2 + 2 3 2 = 2.) The variance 
is E(F^) — (EF n ) 2 = 1, so the standard deviation (like the mean) is 1 . “A 
random permutation of n 2 elements has 1 ± 1 fixed points.” 

8.3 PROBABILITY GENERATING FUNCTIONS 

If X is a random variable that takes only nonnegative integer values, 
we can capture its probability distribution nicely by using the techniques of 
Chapter 7. The probability generating function or pgf of X is 

G x (z) = ^Pr(X = k)z k . ( 8 . 25 ) 

k^O 

This power series in z contains all the information about the random vari- 
able X. We can also express it in two other ways: 

G x (z) = Y_ Pr(u>)z x(a,) = E(z x ) . ( 8 . 26 ) 

t ugQ 


The coefficients of G x (z) are nonnegative, and they sum to 1; the latter 
condition can be written 


G x (l) = 1. ( 8 . 27 ) 

Conversely, any power series G(z) with nonnegative coefficients and with 
G(1 ) = 1 is the pgf of some random variable. 
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The nicest thing about pgf ’s is that they usually simplify the computation 
of means and variances. For example, the mean is easily expressed: 

EX = ^k-Pr(X = k) 

k^O 

= ^Pr(X = k)-kz k_1 | z=] 

k^O 

= Gx(l ) . (8.28) 

We simply differentiate the pgf with respect to z and set z = 1 . 

The variance is only slightly more complicated: 

E(X 2 ) = ^k 2 -Pr(X = k) 

k^O 

= ^Pr(X = k)-(k(k-l)z k - 2 + kz k - 1 )| z=1 = Gx(l) + Gx( 1 ) ■ 
k^O 

Therefore 

VX = GxO) + Gx(l) — Gx(l) 2 ■ (8.29) 

Equations (8.28) and (8.29) tell us that we can compute the mean and variance 
if we can compute the values of two derivatives, G^O) and GxO)- We don’t 
have to know a closed form for the probabilities; we don’t even have to know 
a closed form for Gx(z) itself. 

It is convenient to write 

Mean(G) = G'( 1 ), (8.30) 

Var(G) = G"(l) + G'( 1 ) — G'(l) 2 , (8.31) 

when G is any function, since we frequently want to compute these combina- 
tions of derivatives. 

The second-nicest thing about pgf’s is that they are comparatively sim- 
ple functions of z, in many important cases. For example, let’s look at the 
uniform distribution of order n, in which the random variable takes on each 
of the values { 0 , 1 , . . . , n — 1 } with probability 1 /n. The pgf in this case is 

U n (z) = — (1 + z H fz n_1 ) = — — , for ri 1 . (8.32) 

n n 1 - z 

We have a closed form for U n (z) because this is a geometric series. 

But this closed form proves to be somewhat embarrassing: When we plug 
in z = 1 (the value of z that’s most critical for the pgf), we get the undefined 
ratio 0/0, even though U n (z) is a polynomial that is perfectly well defined 
at any value of z. The value U n ( 1 ) = 1 is obvious from the non-closed form 



396 DISCRETE PROBABILITY 


(1 + z + • • • + z n ~ )/n, yet it seems that we must resort to L’Hospital’s rule 
to find lim z _>i U n (z) if we want to determine U n (1) from the dosed form. 
The determination of U^(l ) by L’Hospital’s rule will be even harder, because 
there will be a factor of [z— 1 ) 2 in the denominator; U"(l ) will be harder still. 

Luckily there’s a nice way out of this dilemma. If G(z) = ^ n >o 9nZ n is 
any power series that converges for at least one value of z with |z| > 1 , the 
power series G'(z) = 2I n>0 ng n z n_1 will also have this property, and so will 
G"(z), G'"(z), etc. Therefore by Taylor’s theorem we can write 


G ( 1 + t) 


G(1) + 


G'(l ) G"(1) 2 G"'(l ) 3 

— — t H —t 2 H — -t 3 

1! 2! 3! 


h — ; 


(8-33) 


all derivatives of G(z) at z = 1 will appear as coefficients, when G(1 + t) is 
expanded in powers of t. 

For example, the derivatives of the uniform pgf U n (z) are easily found 
in this way: 


Un(1 +t) = 


1 (1 +t) n -l 


— ( U 
nil 


1/n 

n\2 


1 (n 

t + - L 

n V 3 


2 1 /n' 

t 2 + ■■■ + -[ 

n \n 


I.TI— 1 


Comparing this to (8.33) gives 
U n (l) = 1; u;(l) = 


n — 1 


U "(11 = 


(n — 1 )(n — 2) 


(8-34) 


and in general U[ l m ' ( 1 ) = (n — 1 )— /(m + 1 ), although we need only the cases 
m = 1 and m = 2 to compute the mean and the variance. The mean of the 
uniform distribution is 




n- 1 
2 ’ 


(8-35) 


and the variance is 


u"(i) + u;o) 


u;m 2 


(n_ 1) (n — 2) , r (n- 1) ,(n-l ) 2 

4 12 ur- 


n 2 -l 

12 


(8.36) 


The third-nicest thing about pgf ’s is that the product of pgf ’s corresponds 
to the sum of independent random variables. We learned in Chapters 5 and 7 
that the product of generating functions corresponds to the convolution of 
sequences; but it’s even more important in applications to know that the 
convolution of probabilities corresponds to the sum of independent random 
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variables. Indeed, if X and Y are random variables that take on nothing but 
integer values, the probability that X + Y = n is 

Pr(X + Y = n) = Pr(X = k and Y — n — k) . 

k 

If X and Y are independent, we now have 

Pr(X + Y = n) = ^Pr(X = k) Pr(Y = n-k) , 

k 

a convolution. Therefore — and this is the punch line — 

Gx+y(z) = Gx(z)Gy(z), if X and Y are independent. (8-37) 

Earlier this chapter we observed that V(X + Y) = VX + VY when X and Y are 
independent. Let F(z) and G(z) be the pgf’s for X and Y, and let H(z) be the 
pgf for X + Y. Then 

H(z) = F(z) G(z) , 

and our formulas (8.28) through (8.31) for mean and variance tell us that we 
must have 

Mean(H) = Mean(F) + Mean(G) ; (8.38) 

Var(H) = Var(F) + Var(G) . (8.39) 

These formulas, which are properties of the derivatives Mean(H) = H'(l ) and 
Var (H) =H"(1) + H'(1) — H'(1) 2 , aren’t valid for arbitrary function products 
H(z) = F(z)G(z); we have 

H'(z) = F'(z)G(z) + F(z)G'(z), 

H"(z) = F"(z)G(z)+2F'(z)G'(z)+F(z)G"(z). 

But if we set z = 1 , we can see that (8.38) and (8.39) will be valid in general 
provided only that 

F(l) = G(1) = 1 (8.40) 


I'll graduate magna 
cum ulant. 


and that the derivatives exist. The “probabilities” don’t have to be in [0 . . 1] 
for these formulas to hold. We can normalize the functions F(z) and G(z) 
by dividing through by F(1 ) and G(1 ) in order to make this condition valid, 
whenever F(l) and GO) are nonzero. 

Mean and variance aren’t the whole story. They are merely two of an 
infinite series of so-called cumulant statistics introduced by the Danish as- 
tronomer Thorvald Nicolai Thiele [351] in 1903. The first two cumulants 
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Ki and K2 of a random variable are what we have called the mean and the 
variance; there also are higher-order cumulants that express more subtle prop- 
erties of a distribution. The general formula 


In G ( e f ) 


Kl K 2 t 

— t + — t 2 
1 ! 2 ! 


K 3 , K 4 4 

3! 1 + 4! 1 1 + 


(8-41) 


defines the cumulants of all orders, when G(z) is the pgf of a random variable. 
Let’s look at cumulants more closely. If G(z) is the pgf for X, we have 


G(e*) = ^Pr(X = k)e kt 

k^O 


Y_ Pr(X = k) 

k,m^0 


k m t m 

m! 


= i + r2_t + r± t 2 + — c 
1 ! 2 ! 3 ! 


where 


Pm = J> m Pr(X = k) = E(X m ) . 

k^O 


(8.42) 

( 8 - 43 ) 


This quantity p m is called the “mth moment” of X. We can take exponentials 
on both sides of (8.41), obtaining another formula for Gfe*): 


t (KitT lK 2 t 2 + ---) (Kit+ iK 2 t 2 + ---) 2 

G(e ) = 1+ 2 1; + 2 , + "' 

= 1 -I- KD + 1(K2 + K 2 )t 2 + ■■ ■ . 

Equating coefficients of powers of t leads to a series of formulas 


ki = pi , (8.44) 

K2 = P2~P 2 , (8.45) 

K3 = P 3 - 3 pi P2 + 2 pi , (8.46) 

k 4 = p 4 - 4 pi p 3 + 12 pfp 2 - 3 p| - 6p^ , (8.47) 

k 5 = p 5 - 5 pip 4 + 20 pfp 3 - IOP2P3 

+ 30pi P 2 - 6 OP 1 p 2 + 24 p^ , ( 8 . 48 ) 


defining the cumulants in terms of the moments. Notice that Kz is indeed the 
variance, L(X 2 )-(EX) 2 , as claimed. 

Equation (8.41) makes it clear that the cumulants defined by the product 
F(z)G(z) of two pgf’s will be the sums of the corresponding cumulants of F(z) 
and G(z), because logarithms of products are sums. Therefore all cumulants 
of the sum of independent random variables are additive, just as the mean and 
variance are. This property makes cumulants more important than moments. 


“For these higher 
half-invariants we 
shall propose no 
special names.” 

— T.N. Thiele [351] 
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If we take a slightly different tack, writing 
G(l+t) = ' + ^‘+f * 2 + t * 3 + "'. 
equation (8.33) tells us that the a’s are the “factorial moments” 


a m = G (m) (1 


,k— m I 


= ^Pr(X = k)k^z k |z=1 

k^O 

= ^ k— Pr(X = k) 


k^O 

= E(X— ) 


(8.49) 


It follows that 


G(e') = 1 + ^f(e' - 1 ) + ^f(e l - 1 H 

= l + ^(t + it 2 + ...) + f(t 2 + t 3 + . ■■) + -■■ 

= 1 + CXI t + 2 ( CX2 + CXI )t 2 H , 

and we can express the cumulants in terms of the derivatives G ,m * ( 1 ): 

(8-50) 
( 8 - 5 i) 
(8.52) 


Ki = a! , 

K 2 = 0 t 2 + ai — af , 

K 3 = CK .3 + 3oc2 + ai — 3oc2ai — 3af + 2a\ , 


This sequence of formulas yields “additive” identities that extend (8.38) and 
(8.39) to all the cumulants. 

Let’s get back down to earth and apply these ideas to simple examples. 
The simplest case of a random variable is a “random constant” where X has 
a certain fixed value x with probability 1 . In this case G x (z) = z x , and 
lnGxfe 4 ) = xt; hence the mean is x and all other cumulants are zero. It 
follows that the operation of multiplying any pgf by z x increases the mean 
by x but leaves the variance and all other cumulants unchanged. 

How do probability generating functions apply to dice? The distribution 
of spots on one fair die has the pgf 

. z + z 2 + z 3 +z 4 +z 5 +z 6 
G(z) = 


6 


zll 6 (z) , 
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where Ug is the pgf for the uniform distribution of order 6 . The factor ‘z’ 
adds 1 to the mean, so the mean is 3.5 instead of = 2.5 as given in ( 8 . 35 ); 
but an extra ‘z’ does not affect the variance ( 8 . 36 ), which equals yf. 

The pgf for total spots on two independent dice is the square of the pgf 
for spots on one die, 

„ , , z 2 +2z 3 +3z 4 +4z 5 +5z 6 + 6 z 7 + 5z 8 +4z 9 +3z 10 +2z n +z 12 

GsW = 36 

= z 2 Ug(z) 2 . 

If we roll a pair of fair dice n times, the probability that we get a total of 
k spots overall is, similarly, 

[z k ]G s (zr = [z k ]z 2n U 6 (z) 2n 
= [z k - 2n ]U 6 (z) 2n . 


In the hats-off-to-football-victory problem considered earlier, otherwise 
known as the problem of enumerating the fixed points of a random permuta- 
tion, we know from ( 5 . 49 ) that the pgf is 


Fn(z) 


L 

0^k$n 


(n-k)j z k 
(n-k)! k! ’ 


for n > 0 . 


Therefore 


f: 


L 

1^k$n 


(n — k)j z k 1 
(n-k)! (k — 1 )! 


L 

0^k$n-l 


(n — 1 — k) j z k 
(n- 1 -k)! k! 


Fn-i(z). 


(8-53) 


Without knowing the details of the coefficients, we can conclude from this 
recurrence F( v (z) = F n _i (z) that Fn m) (z) = F n _ m (z); hence 

F^(1) = F n _ m (l) = [n^m], ( 8 . 54 ) 

This formula makes it easy to calculate the mean and variance; we find as 
before (but more quickly) that they are both equal to 1 when n)j 2 . 

In fact, we can now show that the mth cumulant K m of this random 
variable is equal to 1 whenever n m. For the mth cumulant depends only 
on F^l), F"(l), ..., F^'(l), and these are all equal to 1; hence we obtain 


Hat distribution is 
a different kind of 
uniform distribu- 
tion. 
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Con artists know 
that p « 0.1 
when you spin a 
newly minted U.S. 
penny on a smooth 
table. (The weight 
distribution makes 
Lincoln ’s head fail 
downward.) 


the same answer for the mth cumulant as we do when we replace F n (z) by 
the limiting pgf 

Foo(z) = e z_1 , (8.55) 

which has F^^O) = 1 for derivatives of all orders. The cumulants of Fqo are 
identically equal to 1 , because 

In F oc ,(e t ) = lne 6 * -1 = e* - 1 = j7 + + 1(7 + ' ' ' ■ 

8.4 FLIPPING COINS 

Now let’s turn to processes that have just two outcomes. If we flip 
a coin, there’s probability p that it comes up heads and probability q that it 
comes up tails, where 

p + q = 1 . 

(We assume that the coin doesn’t come to rest on its edge, or fall into a hole, 
etc.) Throughout this section, the numbers p and q will always sum to 1 . If 
the coin is fair, we have p = q = j ; otherwise the coin is said to be biased. 

The probability generating function for the number of heads after one 
toss of a coin is 

H(z) = q+pz. (8.56) 

If we toss the coin n times, always assuming that different coin tosses are 
independent, the number of heads is generated by 

H(z) n = (q+pz)- = ^ (^)p k q n - k z\ (8.57) 

k^O k ' 

according to the binomial theorem. Thus, the chance that we obtain exactly k 
heads in n tosses is (-)p k q n ~ k . This sequence of probabilities is called the 
binomial distribution. 

Suppose we toss a coin repeatedly until heads first turns up. What is 
the probability that exactly k tosses will be required? We have k = 1 with 
probability p (since this is the probability of heads on the first flip); we 
have k = 2 with probability qp (since this is the probability of tails first, then 
heads); and for general k the probability is q k_1 p. So the generating function 
is 


pz+ qpz 2 + q 2 pz 3 H 


pz 

1 — qz ' 


(8.58) 
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Repeating the process until n heads are obtained gives the pgf 




This, incidentally, is z n times 


1 — qz 


L 


n - 


-k- 1 
k 


p n q k z\ 


(8-59) 


(8.6o) 


the generating function for the negative binomial distribution. 

The probability space in example (8.59), where we flip a coin until 
n heads have appeared, is different from the probability spaces we’ve seen 
earlier in this chapter, because it contains infinitely many elements. Each el- 
ement is a finite sequence of heads and/or tails, containing precisely n heads 
in all, and ending with heads; the probability of such a sequence is p n q k ~ n , 
where k — n is the number of tails. Thus, for example, if n = 3 and if we 
write H for heads and T for tails, the sequence THTTTHH is an element of the 
probability space, and its probability is qpqqqpp = p 3 q 4 . 

Let X be a random variable with the binomial distribution (8.57), and let 
Y be a random variable with the negative binomial distribution (8.60). These 
distributions depend on n and p. The mean of X is nH'(1) = np, since its 
pgf is H(z) n ; the variance is 


Heads I win, 
tails you lose. 

No? OK; tails you 
lose, beads I win. 

No? Well, then, 
beads you lose, 
tails I win. 


n(H"m+H'(1)-H , (1) 2 ) = n(0 + p-p 2 ) = npq . (8.61) 

Thus the standard deviation is \/npq : If we toss a coin n times, we expect 
to get heads about np ± \/npq times. The mean and variance of Y can be 
found in a similar way: If we let 

G(z) 

we have 

G'(z) 

G"(z) 

hence G'(1 ) = pq/p 2 = q/p and G"(l ) = 2 pq 2 /p 3 = 2q 2 /p 2 . It follows that 
the mean of Y is nq/p and the variance is nq/p 2 . 


pq 

(1 - qz) 2 ’ 
2pq 2 

(i - qz) 3 ’ 
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A simpler way to derive the mean and variance of Y is to use the reciprocal 
generating function 

, 1 — qz 1 q . 

F(z) = = — z, (8.62) 

P P P 

and to write 


G(z) n = F(z) -n . 


(8.63) 


The probability is 
negative that I'm 
getting younger. 

Oh? Then it’s > 1 
that you ’re getting 
older, or staying 
the same. 


This polynomial F(z) is not a probability generating function, because it has 
a negative coefficient. But it does satisfy the crucial condition F(1) = 1. 
Thus F(z) is formally a binomial that corresponds to a coin for which we 
get heads with “probability” equal to — q/p; and G(z) is formally equivalent 
to flipping such a coin —1 times(!). The negative binomial distribution 
with parameters (n,p) can therefore be regarded as the ordinary binomial 
distribution with parameters (n/,p') = (— n, — q/p). Proceeding formally, 
the mean must be n'p' = (— n)(— q/p) = nq/p, and the variance must be 
n'p'q' = (— n)(— q/p)(l + q/p) = nq/p 2 . This formal derivation involving 
negative probabilities is valid, because our derivation for ordinary binomials 
was based on identities between formal power series in which the assumption 
0 /p/l was never used. 

Let’s move on to another example: How many times do we have to flip 
a coin until we get heads twice in a row? The probability space now consists 
of all sequences of H’s and T’s that end with HH but have no consecutive H’s 
until the final position: 


Q = {HH, THH, TTHH, HTHH, TTTHH, THTHH, HTTHH, . . . } . 


The probability of any given sequence is obtained by replacing H by p and T 
by q; for example, the sequence THTHH will occur with probability 

Pr(THTHH) = qpqpp = p 3 q 2 . 


We can now play with generating functions as we did at the beginning 
of Chapter 7, letting S be the infinite sum 

S = HH + THH + TTHH + HTHH + TTTHH + THTHH + HTTHH H 


of all the elements of Q. If we replace each H by pz and each T by qz, we get 
the probability generating function for the number of flips needed until two 
consecutive heads turn up. 
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There’s a curious relation between S and the sum of domino tilings 

T = i + n + m + H + m] + [H + H]+ • • • 

in equation (7.1). Indeed, we obtain S from T if we replace each □ by T and 
each B by HT, then tack on an HH at the end. This correspondence is easy to 
prove because each element of £1 has the form (T + HT) n HH for some n ^ 0, 
and each term of T has the form (□ + B) n . Therefore by (7.4) we have 

S = (1 — T — HT ) ~ 1 HH , 

and the probability generating function for our problem is 


G(z) 


(1 - qz- (pz)(qz)) 1 (pz) 2 


1 — qz — pqz 2 


(8.64) 


Our experience with the negative binomial distribution gives us a clue 
that we can most easily calculate the mean and variance of (8.64) by writing 


where 

1 - qz — pqz 2 
F(z) = , 

and by calculating the “mean” and “variance” of this pseudo-pgf F(z). (Once 
again we’ve introduced a function with F( 1 ) = 1 .) We have 

F'(l) = (— q — 2pq)/p 2 = 2-p- 1 - p~ 2 ; 

F"(l) = -2pq/p 2 = 2-2p- ] . 


Therefore, since z 2 = F(z)G(z), Mean(z 2 ) = 2, and Var(z 2 ) = 0, the mean 
and variance of distribution G(z) are 


Mean(G) = 2 — Mean(F) = p 2 +p \ (8.65) 

Var(G) = -Var(F) = p~ 4 + 2p~ 3 - 2p~ 2 - p^ 1 . (8.66) 

When p — \ the mean and variance are 6 and 22, respectively. (Exercise 4 
discusses the calculation of means and variances by subtraction.) 
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Now let’s try a more intricate experiment: We will flip coins until the 
pattern THTTH is first obtained. The sum of winning positions is now 


S = THTTH + HTHTTH + TTHTTH 

+ HHTHTTH + HTTHTTH + THTHTTH + TTTHTTH H ; 


“ ‘You really are an 
automaton — a cal- 
culating machine, ’ 

I cried. ‘There is 
something positively 
inhuman in you at 
times.’” 

— J. H. Watson [83] 


this sum is more difficult to describe than the previous one. If we go back to 
the method by which we solved the domino problems in Chapter 7, we can 
obtain a formula for S by considering it as a “finite state language” defined 
by the following “automaton” : 



The elementary events in the probability space are the sequences of H’s and 
T’s that lead from state 0 to state 5. Suppose, for example, that we have 
just seen THT; then we are in state 3. Flipping tails now takes us to state 4; 
flipping heads in state 3 would take us to state 2 (not all the way back to 
state 0, since the TH we’ve just seen may be followed by TTH). 

In this formulation, we can let Sk be the sum of all sequences of H’s and 
T’s that lead to state k; it follows that 


50 = 1 +S 0 H + S 2 H, 

51 = S 0 T + Si T + S 4 T, 

s 2 = Si H + S 3 H, 

s 3 = s 2 T, 

5 4 = S 3 T , 

5 5 = S 4 H. 

Now the sum S in our problem is S 5 ; we can obtain it by solving these six 
equations in the six unknowns So, Si , . . . , S 5 . Replacing H by pz and T by qz 
gives generating functions where the coefficient of z n in Sk is the probability 
that we are in state k after n flips. 

In the same way, any diagram of transitions between states, where the 
transition from state j to state k occurs with given probability Pj,k, leads to 
a set of simultaneous linear equations whose solutions are generating func- 
tions for the state probabilities after n transitions have occurred. Systems 
of this kind are called Markov processes, and the theory of their behavior is 
intimately related to the theory of linear equations. 
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But the coin-flipping problem can be solved in a much simpler way, 
without the complexities of the general finite-state approach. Instead of six 
equations in six unknowns So, Si, ..., S5, we can characterize S with only 
two equations in two unknowns. The trick is to consider the auxiliary sum 
N = So + Si + S2 + S3 + S4 of all flip sequences that don’t contain any 
occurrences of the given pattern THTTH: 

N = 1+H + T + HHH b THTHT + THTTT -| . 

We have 


1 +N(H + T) = N +S, (8.67) 

because every term on the left either ends with THTTH (and belongs to S) or 
doesn’t (and belongs to N); conversely, every term on the right is either empty 
or belongs to N H or N T. And we also have the important additional equation 

N THTTH = S + STTH, (8.68) 

because every term on the left completes a term of S after either the first H 
or the second H, and because every term on the right belongs to the left. 

The solution to these two simultaneous equations is easily obtained: We 
have N = (1 — S)(1 — H — T) _1 from (8.67), hence 

(1 -S)(l -T-H) -1 THTTH = S(1+TTH). 


As before, we get the probability generating function G(z) for the number of 
flips if we replace H by pz and T by qz. A bit of simplification occurs since 
p + q = 1 , and we find 


(1 -G(z))p 2 q 3 z 5 
1 — z 


G(z)(l +pq 2 z 3 ); 


hence the solution is 


G(z) 


p 2 q 3 z 5 


p 2 q 3 z 5 + (1 + pq 2 z 3 )(l — z) 


(8.69) 


Notice that G(l) = 1, if pq yb 0; we do eventually encounter the pattern 
THTTH, with probability 1 , unless the coin is rigged so that it always comes 
up heads or always tails. 

To get the mean and variance of the distribution (8.69), we invert G(z) 
as we did in the previous problem, writing G(z) = z 5 /F(z) where F is a poly- 
nomial: 


F(z) 


p 2 q 3 z 5 + (1 +pq 2 z 3 )(l - z) 

p 2 q 3 


(8.70) 
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The relevant derivatives are 

F'( 1 ) = 5-(1+pq 2 )/pV, 

F"(1) = 20 — 6pq 2 /p 2 q 3 ; 


and if X is the number of flips we get 

EX = Mean(G) = 5 — Mean(F) = p~ 2 q~ J + p _1 q _1 ; (8.71) 

VX = Var(G) = -Var(F) 

= —25 + p^q - ” + 7p _1 q _1 + Mean(F) 2 
= (EX) 2 - 9p- 2 q~ 3 - 3p _1 q^ 1 . (8.72) 


When p = j, the mean and variance are 36 and 996. 

Let’s get general: The problem we have just solved was “random” enough 
to show us how to analyze the case that we are waiting for the first appearance 
of an arbitrary pattern A of heads and tails. Again we let S be the sum of 

all winning sequences of H’s and T’s, and we let N be the sum of all sequences 

that haven’t encountered the pattern A yet. Equation (8.67) will remain the 
same; equation (8.68) will become 

NA = S(l + A (1) [A (m_1) = A (m _ 1 ,] + A (2) [A |m ~ 2) = A (m _ 2 )] 

+ ... +A (m-i) [A (i) =A(1)] ) i (8.73) 

where m is the length of A, and where A (k * and A( k ) denote respectively the 
last k characters and the first k characters of A. For example, if A is the 
pattern THTTH we just studied, we have 

A (1) = H, A (2) = TH, A (3) = TTH , A (4) = HTTH ; 

A ( i) = T, A(2) = TH, A(3) = THT , A(4) = THTT . 

Since the only perfect match is A (2 ^ = Am , equation (8.73) reduces to (8.68). 

Let A be the result of substituting p _1 for H and q _1 for T in the pat- 
tern A. Then it is not difficult to generalize our derivation of (8.71) and (8.72) 
to conclude (exercise 20) that the general mean and variance are 


m 

EX = ^A, k) [A( k >=A (k) ]; (8.74) 

k=1 

m 

VX = (EX) 2 -^(2k-l)A (k) [A< k )=A (k) ]. 

k=1 


(8-75) 
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In the special case p = \ we can interpret these formulas in a particularly 
simple way. Given a pattern A of m heads and tails, let 

m 

A: A = ^ 2k_1 [A ,k) =A (k) ]. ( 8 . 76 ) 

k=l 

We can easily find the binary representation of this number by placing a ‘1’ 
under each position such that the string matches itself perfectly when it is 
superimposed on a copy of itself that has been shifted to start in this position: 

A = HTHTHHTHTH 

A:A = (1000010101)2 =512 + 16 + 4 + 1 =533 

HTHTHHTHTH s/ 

HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTHTH sj 
HTHTHHTHTH 
HTHTHHTHTH s/ 

HTHTHHTHTH 
HTHTHHTHTH s/ 

Equation ( 8 . 74 ) now tells us that the expected number of flips until pattern A 
appears is exactly 2(A:A), if we use a fair coin, because A(k) = 2 k when 
p = q = j. This result, first discovered by the Soviet mathematician A. D. 
Solov’ev in 1966 [331], seems paradoxical at first glance: Patterns with no 
self-overlaps occur sooner than overlapping patterns do! It takes almost twice 
as long to encounter HHHHH as it does to encounter HHHHT or THHHH. 

Now let’s consider an amusing game that was invented by (of all people) 
Walter Penney [289] in 1969. Alice and Bill flip a coin until either HHT or 
HTT occurs; Alice wins if the pattern HHT comes first, Bill wins if HTT comes 
first. This game — now called “Penney ante” — certainly seems to be fair, if 
played with a fair coin, because both patterns HHT and HTT have the same 
characteristics if we look at them in isolation: The probability generating 
function for the waiting time until HHT first occurs is 


Of course not! Who 
could they have an 
advantage over? 


G(z) = 


z 3 — 8(z — 1 ) ’ 


and the same is true for HTT. Therefore neither Alice nor Bill has an advan- 
tage, if they play solitaire. 


“Chem bol’she 
periodov u nashego 
slova, tem pozzhe 
ono poiavliaetsia.” 
— A. D. Solov’ev 
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But there’s an interesting interplay between the patterns when both are 
considered simultaneously. Let Sa be the sum of Alice’s winning configura- 
tions, and let S B be the sum of Bill’s: 

S A = HHT + HHHT + THHT + HHHHT + HTHHT + THHHT H ; 

S B = HTT + THTT + HTHTT + TTHTT + THTHTT + TTTHTT -| . 

Also — taking our cue from the trick that worked when only one pattern was 
involved — let us denote by N the sum of all sequences in which neither player 
has won so far: 

N = 1 +H + T + HH + HT + TH + TT + HHH + HTH + THH H . (8.77) 

Then we can easily verify the following set of equations: 

1 +N(H + T) = N + Sa + Sb; 

N HHT = S A ; (8.78) 

N HTT = S A T + S B . 

If we now set H = T = j, the resulting value of Sa becomes the probability 
that Alice wins, and S B becomes the probability that Bill wins. The three 
equations reduce to 

1+N=N + S a + S b ; IN = S A ; 1N = 1 S a + S b ; 

and we find Sa = |, S B = j. Alice will win about twice as often as Bill! 

In a generalization of this game, Alice and Bill choose patterns A and B 
of heads and tails, and they flip coins until either A or B appears. The 
two patterns need not have the same length, but we assume that A doesn’t 
occur within B, nor does B occur within A. (Otherwise the game would be 
degenerate. For example, if A = HT and B = THTH, poor Bill could never win; 
and if A = HTH and B = TH, both players might claim victory simultaneously.) 
Then we can write three equations analogous to (8.73) and (8.78): 

1 +N(H + T) = N + Sa + S b ; 

l min(l,m) 

NA = Sa^A ' 1 -' 1 [A< k )=A (k) ] + S B A (l - k) [B (k ’=A (k) ]; 

k=1 k=1 

min(l,m) m 

NB = S A Y- B (m - k) [A< k >=B (k) ] + S B Y_ B (m ~ k) [B lk) = B (k) ] . 

k=l k=1 

(8-79) 
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Here l is the length of A and m is the length of B. For example, if we have 
A = HTTHTHTH and B = THTHTTH, the two pattern-dependent equations are 

N HTTHTHTH = S A TTHTHTH + S A + S B TTHTHTH + S B THTH ; 

N THTHTTH = S A THTTH + S A TTH + S B THTTH + S B . 

We obtain the victory probabilities by setting H = T = ^ , if we assume that a 
fair coin is being used; this reduces the two crucial equations to 


l min(l,m) 

N = S A ^2 k [A (k >=A {k) ]+S B Y- 2 k [B (k) =A (k) ]; 

k=1 k=1 

min(l,m) m 

N=S A Y 2 k [A (k) =B (kJ ] + S B Y_ 2 k [B (k) =B (lc) ] . 


k=l 


k— 1 


(8.8o) 


We can see what’s going on if we generalize the A:A operation of (8.76) to a 
function of two independent strings A and B: 


min(l,m) 

A:B = Y 2 k ~ ] [A< k >=B (k) ]. (8.81) 

k=l 

Equations (8.80) now become simply 

S a (A:A) + S b (B:A) = S A (A:B) + S b (B:B) ; 


the odds in Alice’s favor are 

S A _ B:B — B:A 
“ A:A — A:B ' 


(8.82) 


(This beautiful formula was discovered by John Horton Conway [137].) 

For example, if A = HTTHTHTH and B = THTHTTH as above, we have 
A:A = (10000001)2 = 129, A:B = (0001010) 2 = 10, B:A = (0001001 ) 2 = 9, 
and B:B = (1000010) 2 = 66; so the ratio S A /S B is (66— 9)/( 1 29— 1 0) =57/119. 

Alice will win this one only 57 times out of every 1 76, on the average. 

Strange things can happen in Penney’s game. For example, the pattern 
HHTH wins over the pattern HTHH with 3/2 odds, and HTHH wins over THHH with 
7/5 odds. So HHTH ought to be much better than THHH. Yet THHH actually wins 
over HHTH, with 7/5 odds! The relation between patterns is not transitive. In Odd, odd. 
fact, exercise 57 proves that if Alice chooses any pattern Ti t 2 . . . Ti of length 
l ts 3, Bill can always ensure better than even chances of winning if he chooses 
the pattern t 2 Ti t 2 . . . Ti_i , where t 2 is the heads/tails opposite of t 2 . 
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“Somehow the verb 
‘to hash ’ magically 
became standard 
terminology for key 
transformation dur- 
ing the mid-1960s, 
yet nobody was rash 
enough to use such 
an undignified word 
publicly until 1967.” 
— D. E. Knuth [209] 


8.5 HASHING 

Let’s conclude this chapter by applying probability theory to com- 
puter programming. Several important algorithms for storing and retrieving 
information inside a computer are based on a technique called “hashing.” 
The general problem is to maintain a set of records that each contain a “key” 
value, K, and some data D(K) about that key; we want to be able to find 
D(K) quickly when K is given. For example, each key might be the name of 
a student, and the associated data might be that student’s homework grades. 

In practice, computers don’t have enough capacity to set aside one mem- 
ory cell for every possible key; billions of keys are possible, but comparatively 
few keys are actually present in any one application. One solution to the 
problem is to maintain two tables KEY[j] and DATA[j] for 1 ^ j ^ N, where 
N is the total number of records that can be accommodated; another vari- 
able n tells how many records are actually present. Then we can search for a 
given key K by going through the table sequentially in an obvious way: 

51 Set ) := 1. (We’ve searched through all positions < j.) 

52 If j > n, stop. (The search was unsuccessful.) 

53 If KEY[j] = K, stop. (The search was successful.) 

54 Increase j by 1 and return to step S2. (We’ll try again.) 

After a successful search, the desired data entry D(K) appears in DATA[j]. 
After an unsuccessful search, we can insert K and D(K) into the table by 
setting 

n := j, KEY [n] := K, DATA [n] := D(K), 

assuming that the table was not already filled to capacity. 

This method works, but it can be dreadfully slow; we need to repeat 
step S2 a total of n + 1 times whenever an unsuccessful search is made, and 
n can be quite large. 

Hashing was invented to speed things up. The basic idea, in one of its 
popular forms, is to use m separate lists instead of one giant list. A “hash 
function” transforms every possible key K into a list number h.(K) between 1 
and m. An auxiliary table FIRST [i] for 1 ^ i ^ m points to the first record 
in list i; another auxiliary table NEXT[j] for 1 ^ j < N points to the record 
following record j in its list. We assume that 

FIRST [i] = — 1 , if list i is empty; 

NEXT[j] = 0 , if record ) is the last in its list. 


As before, there’s a variable n. that tells how many records have been stored 
altogether. 
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For example, suppose the keys are names, and suppose that there are 
m = 4 lists based on the first letter of a name: 

n , for A-F; 

w a I 2, for G-L; 

h name = < ’ 

J | 3, for M-R; 

l 4, for S-Z. 

We start with four empty lists and with n = 0. If, say, the first record has 
Nora as its key, we have h(Nora) = 3, so Nora becomes the key of the first 
item in list 3. If the next two names are Glenn and Jim, they both go into 
list 2. Now the tables in memory look like this: 


FIRST [1] =-1, FIRST [2] =2, 
KEY [1 ] = Nora, NEXT[1] = 

KEY [2] = Glenn, NEXT [2] = 

KEY [3] = Jim, NEXT [3] = 


FIRST [3] = 1, 


n = 3. 


FIRST [4] = -1. 


(The values of DATA[1], DATA [2], and DATA [3] are confidential and will not 
be shown.) After 18 records have been inserted, the lists might contain the 
names 


list 1 list 2 list 3 list 4 


Dianne 

Glenn 

Nora 

Scott 

Ari 

Jim 

Mike 

Tina 

Brian 

Jennifer 

Michael 


Fran 

Joan 

Ray 


Doug 

Jerry 

Jean 

Paula 



and these names would appear intermixed in the KEY array with NEXT entries 
to keep the lists effectively separate. If we now want to search for John, we 
have to scan through the six names in list 2 (which happens to be the longest 
list); but that’s not nearly as bad as looking at all 18 names. 

Here’s a precise specification of the algorithm that searches for key K in 
accordance with this scheme: 

HI Set i := h.(K) and j := FIRST [i] . 

H2 If j sj 0, stop. (The search was unsuccessful.) 

H3 If KEY[j] = K, stop. (The search was successful.) 

H4 Set i := j, then set ) := NEXT [i] and return to step H2. (We’ll try again.) 
For example, to search for Jennifer in the example given, step HI would set 
x := 2 and ) := 2; step H3 would find that Glenn ^ Jennifer; step H4 would 
set j := 3; and step H3 would find Jim ^ Jennifer. One more iteration of 
steps H4 and H3 would locate Jennifer in the table. 


Let’s hear it for 
the Concrete Math 
students who sat in 
the front rows and 
lent their names to 
this experiment. 


I bet their parents 
are glad about that. 
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After a successful search, the desired data D(K) appears in DATA [j] , as in 
the previous algorithm. After an unsuccessful search, we can enter K and D(K) 
in the table by doing the following operations: 

n := n + 1; 

if j < 0 then FIRST [i] := n else NEXT [i] := n; 

KEY [n] := K; DATA [n] := D(K); NEXT [n] := 0. (8.83) 

Now the table will once again be up to date. 

We hope to get lists of roughly equal length, because this will make the 
task of searching about m times faster. The value of m is usually much greater 
than 4, so a factor of 1 /m will be a significant improvement. 

We don’t know in advance what keys will be present, but it is generally 
possible to choose the hash function h so that we can consider h(K) to be a 
random variable that is uniformly distributed between 1 and m, independent 
of the hash values of other keys that are present. In such cases computing the 
hash function is like rolling a die that has m faces. There’s a chance that all 
the records will fall into the same list, just as there’s a chance that a die will 
always turn up |TT] ; but probability theory tells us that the lists will almost 
always be pretty evenly balanced. 

Analysis of Hashing: Introduction. 

“Algorithmic analysis” is a branch of computer science that derives quan- 
titative information about the efficiency of computer methods. “Probabilistic 
analysis of an algorithm” is the study of an algorithm’s running time, con- 
sidered as a random variable that depends on assumed characteristics of the 
input data. Hashing is an especially good candidate for probabilistic analysis, 
because it is an extremely efficient method on the average, even though its 
worst case is too horrible to contemplate. (The worst case occurs when all 
keys have the same hash value.) Indeed, a computer programmer who uses 
hashing had better be a believer in probability theory. 

Let P be the number of times step H3 is performed when the algorithm 
above is used to carry out a search. (Each execution of H3 is called a “probe” 
in the table.) If we know P, we know how often each step is performed, 
depending on whether the search is successful or unsuccessful: 


Step 

Unsuccessful search 

Successful sear 

HI 

1 time 

1 time 

H2 

P + 1 times 

P times 

H3 

P times 

P times 

H4 

P times 

P — 1 times 
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Thus the main quantity that governs the running time of the search procedure 
is the number of probes, P. 

We can get a good mental picture of the algorithm by imagining that we 
are keeping an address book that is organized in a special way, with room for 
only one entry per page. On the cover of the book we note down the page 
number for the first entry in each of m lists; each name K determines the list 
h(K) that it belongs to. Every page inside the book refers to the successor 
page in its list. The number of probes needed to find an address in such a 
book is the number of pages we must consult. 

If n items have been inserted, their positions in the table depend only 
on their respective hash values, (hi , h.2, ..., h n ). Each of the m n possible 
sequences (hi , h.2 , . . . , h n ) is considered to be equally likely, and P is a random 
variable depending on such a sequence. 

Case 1 : The key is not present. 

Let’s consider first the behavior of P in an unsuccessful search, assuming 
that n records have previously been inserted into the hash table. In this case 
the relevant probability space consists of m n+1 elementary events 

w = (hi,h 2 ,...,h n ,h n+ i) 

where hj is the hash value of the jth key inserted, and where h n+ i is the 
hash value of the key for which the search is unsuccessful. We assume that 
the hash function h has been chosen properly so that Pr(cu) = 1 /m n+1 for 
every such cu. 

For example, if m = n = 2 , there are eight equally likely possibilities: 
hi h 2 h 3 : P 

1 1 1:2 

1 1 2:0 

1 2 1:1 

1 2 2:1 

2 1 1:1 

2 1 2:1 

2 2 1:0 

2 2 2:2 

If hi = h.2 = h 3 we make two unsuccessful probes before concluding that the 
new key K is not present; if hi = h.2 ^ h 3 we make none; and so on. This list 
of all possibilities shows that P has a probability distribution given by the pgf 
(§ + |z+ |z 2 ) = (j + jz) 2 , when m = n = 2. 

An unsuccessful search makes one probe for every item in list number 
h n+ i , so we have the general formula 


Check under the 
doormat. 


P = [hi =hn+i] + [h 2 =h n+ i] + -■ + [h n = h n+ i ] . 


(8.84) 
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The probability that hj = tv+i is 1/m, for 1 j ;/ n; so it follows that 

EP = E[hi = h n+ i ] + E[h 2 = h n+ i ] + hEIhn^Hn+T] = — . 

m 

Maybe we should do that more slowly: Let Xj be the random variable 

Xj = Xj(cu) = [H j =H n+1 ]. 

Then P = Xj + • ■ ■ + X n , and EXj = 1 /m for all ) <J n; hence 
EP = EXi + • • • + EX n = n/m. 


Good: As we had hoped, the average number of probes is 1 /m times what it 
was without hashing. Furthermore the random variables Xj are independent, 
and they each have the same probability generating function 


X;(z) = 


m — 1 


m 


therefore the pgf for the total number of probes in an unsuccessful search is 


P(z) 


Xi (z) . . . X n fz) = 


(Nr 1 )’ 


(8.85) 


This is a binomial distribution, with p = 1/m and q = (m — l)/m; in other 
words, the number of probes in an unsuccessful search behaves just like the 
number of heads when we toss a biased coin whose probability of heads is 
1/m on each toss. Equation (8.61) tells us that the variance of P is therefore 


nfm — 1 ) 

npq = y — • 

m 2 

When m is large, the variance of P is approximately n/m, so the standard 
deviation is approximately \Jn/ m. 

Case 2: The key is present. 

Now let’s look at successful searches. In this case the appropriate proba- 
bility space is a bit more complicated, depending on our application: We will 
let Q be the set of all elementary events 


w = (hj , . . . , h n ; k) , 


( 8 . 86 ) 


where hj is the hash value for the jth key as before, and where k is the index 
of the key being sought (the key whose hash value is h.^). Thus we have 
1 ^ hj ^ m for 1 Sj ) ^ n, and 1 ^ k / n; there are m n • n elementary 
events cu in all. 
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Let Sj be the probability that we are searching for the jth key that was 
inserted into the table. Then 

Pr(cn) = s k /m n (8.87) 

if cu is the event (8.86). (Some applications search most often for the items 
that were inserted first, or for the items that were inserted last, so we will 
not assume that each Sj = 1/n.) Notice that HujeQP r ( LU ) = 2Ik=i s k = 1, 
hence (8.87) defines a legal probability distribution. 

The number of probes P in a successful search is p if key K was the pth 
key to be inserted into its list. Therefore 


Pfhi , . . . ,h n ; k) = [In =h k ] + [h 2 = h. k ] + • • • + [h k =h k ] ; (8.88) 

or, if we let Xj be the random variable [hj =h k ], we have 

P = Xi + X2 + ■ ■ ■ + X k . (8-89) 

Suppose, for example, that we have m = 1 0 and n = 1 6, and that the hash 
values have the following “random” pattern: 


(Kt , . . . , Tl 16 ) = 3141592653589793; 
(Pi,... ,P 16 ) = 1112111122312133. 


The number of probes Pj needed to find the jth key is shown below hj . 

Equation (8.89) represents P as a sum of random variables, but we can’t 
simply calculate EP as EXi +• • -+EX k because the quantity k itself is a random 
variable. What is the probability generating function for P? To answer this 
question we should digress a moment to talk about conditional probability. 

If A and B are events in a probability space, we say that the conditional 
probability of A, given B, is 


Pr(cueA|cueB) 


Pr(o>e AH B) 
Pr(co e B) 


(8.90) 


For example, if X and Y are random variables, the conditional probability of 
the event X = x, given that Y = y , is 


Pr(X =x | Y = y ) 


Pr(X = x and Y = y) 
Pr(Y = y) 


(8-91) 


For any fixed y in the range of Y, the sum of these conditional probabil- 
ities over all x in the range of X is Pr(Y = y)/Pr(Y = y) = 1; therefore (8.91) 
defines a probability distribution, and we can define a new random variable 
‘ X | y ’ such that Pr((X|y) =x) = Pr(X = x | Y = y). 


Where have I seen 
that pattern before? 


Equation (8.43) was 
also a momentary 
digression. 
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If X and Y are independent, the random variable X|y will be essentially 
the same as X, regardless of the value of y, because Pr(X = x | Y = y) is equal 
to Pr(X = x) by (8.5); that’s what independence means. But if X and Y are 
dependent, the random variables X|y and X|y' need not resemble each other 
in any way when y ^y'. 

If X takes only nonnegative integer values, we can decompose its pgf into 
a sum of conditional pgf’s with respect to any other random variable Y: 

G x (z) = Y- Pr (Y = y)G X | v (z). (8.92) 

yeY(O) 

This holds because the coefficient of z x on the left side is Pr(X=x), for all 
x £ X(fl), and on the right it is 

Y_ Pr(Y = y ) Pr(X = x | Y = y) = Y_ Pr(X = x and Y = y) 

yeY(Q) yeY(Q) 

= Pr(X=x). 

For example, if X is the product of the spots on two fair dice and if Y is the 
sum of the spots, the pgf for X|6 is 

G X |e(z) = §z 5 + §z 8 + \z 9 

because the conditional probabilities for Y — 6 consist of five equally probable 
events { 00 , Hid. 00 > 00 . 001 - Equation (8.92) in this case 
reduces to 


Gx(z) = jg G x 1 2 ( z ) + ^G X |3( z ) + ^G X |4( z ) + ^G X | 5 (z) 

^G X | 6 (z) + y^G X | 7 (z) + ^G X | S (z) + ^G X | 9 (z) 
^G X |io(z) + ^G X |n (z) + ygG X |i2(z) , 


Oh, now 1 un- 
derstand what 
mathematicians 
mean when they 
say something is 
“obvious,” “clear," 
or “trivial.” 


a formula that is obvious once you understand it. (End of digression.) 

In the case of hashing, (8.92) tells us how to write down the pgf for probes 
in a successful search, if we let X = P and Y = K. For any fixed k between 1 
and n, the random variable P|k is defined as a sum of independent random 
variables Xi + • • • + X^; this is (8.89). So it has the pgf 
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Therefore the pgf for P itself is clearly 


n 

Gp(z) = y~ SkG P |i c (z) 

k=1 


“By clearly, I mean 
a good freshman 
should be able to do 
it, although it’s not 
completely trivial.” 

— Paul Erdos [94], 


/m- 1 + z\ k ~' 

= > s ic z 

k=1 


= zS 


(=4^) 


where 

S(z) 


Sl+S2Z+S3Z 2 H +S n Z n 1 


(8-93) 


(8-94) 


is the pgf for the search probabilities Sk (divided by z for convenience). 

Good. We have a probability generating function for P; we can now find 
the mean and variance by differentiation. It’s somewhat easier to remove the 
z factor first, as we’ve done before, thus finding the mean and variance of 
P — 1 instead: 


F(z) 

F'(z) 

F"(z) 

Therefore 


Gp(z)/z = s( 

( 

^ s "( 


m — 1 + z 

m 


1 s'( m ~ ]+z 

m 

m — 1 + z 


FP = 1 + Mean(F) = 1+F'(1) = 1+m 1 Mean(S) ; 

VP = Var(F) = F"(1 ) + F'(1 ) - F'(1 ) 2 

= mT 2 S"(1 ) + mT 1 S'(1 ) — mT 2 S'(1 ) 2 
= mV 2 Var(S) + (mV 1 — m -2 ) Mean(S) . 


(8-95) 


(8.96) 


These are general formulas expressing the mean and variance of the num- 
ber of probes P in terms of the mean and variance of the assumed search 
distribution S. 

For example, suppose we have Sk = 1/n for 1 ^ k ^ n. This means 
we are doing a purely “random” successful search, with all keys in the table 
equally likely. Then S(z) is the uniform probability distribution U n (z) in 
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OK, gang, time 
to put on your 
skim suits again. 

— Friendly TA 


(8.32), and we have Mean(S) = (n — 1)/2, Var(S) = (n 2 — 1)/12. Hence 


EP 

VP 


n- 1 
2m 


+ i ; 


n 2 — 1 (m 
12m 2 + 


-D(tv 

2m 2 


D 


(n — 1 )(6m + n — 5) 
12m 2 


(8-97) 

(8.98) 


Once again we have gained the desired speedup factor of 1 /m. If m = n/ln n 
and n — > 00, the average number of probes per successful search in this case 
is about 2 Inn, and the standard deviation is asymptotically (lnn)/\/T2. 

On the other hand, we might suppose that = (kH n ) _1 for 1 k ^ n; 
this distribution is called “Zipf’s law.” Then Mean(G) = n/H n and Var(G) = 
2n(n + 1 )/H n — n 2 /H 2 . The average number of probes for m = n/lnn as 
n — » 00 is approximately 2, with standard deviation asymptotic to \/ln n/ \fl. 

In both cases the analysis allows the cautious souls among us, who fear 
the worst case, to rest easily: Chebyshev’s inequality tells us that the lists 
will be nice and short, except in extremely rare cases. 

Case 2, continued: Variants of the variance. 

We have just computed the variance of the number of probes in a success- 
ful search, by considering P to be a random variable over a probability space 
with m n -n elements (hi , . . . , h tx ; k). But we could have adopted another point 
of view: Each pattern (hi , . . . , h n ) of hash values defines a random variable 
P| (hi , . . . , hrj, representing the probes we make in a successful search of a 
particular hash table on n. given keys. The average value of P| (hi , . . . , h n ), 


n 

A(hi,...,h n ) = ^p-Pr((P|(hi,...,h n ))=p) , (8.99) 

p=i 

can be said to represent the running time of a successful search. This quantity 
A (hi , . . . , hn) is a random variable that depends only on (hi , . . . , hn), not on 
the final component k. We can write it in the form 

n 

A(Hi,...,hn) = Y SkP(Hi,...,hn;k), 

k=1 

where P(hi , . . . , h n ; k) is defined in (8.88), since P| (hi , . . . , h n ) = p with 
probability 


ILi Pr(P(h 1 ,...,h n ;k)=p) _ ]Tk=i m n s k [P(hi , . . . , h n ; k) =p] 
^ =] pr (hi,...,h n ;k) Lk=i 

n 

= Y sic|P(hi,...,h Tt ;k)=p1 . 

k=1 
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The mean value of A (hi , . . . , h^), obtained by summing over all m n pos- 
sibilities (h.| , . . . , Hn.) and dividing by m 11 , will be the same as the mean value 
we obtained before in (8.95). But the variance of A(hi , . . . , tin) is something 
different; this is a variance of m n averages, not a variance of m n -n probe 
counts. For example, if m = 1 (so that there is only one list), the “average” 
value A(hi , . . . , h n ) = A (1 , . . . , 1 ) is actually constant, so its variance VA is 
zero; but the number of probes in a successful search is not constant, so the 
variance VP is nonzero. 

We can illustrate this difference between variances by carrying out the 
calculations for general m and n in the simplest case, when Sk = 1/n for 
1 k si n. In other words, we will assume temporarily that there is a uniform 
distribution of search keys. Any given sequence of hash values (hi , . . . ,h n ) 
defines m lists that contain respectively (ni , n.2, . . . , n m ) entries for some 
numbers rtj, where 


ui + n.2 4 b n m = n . 

A successful search in which each of the n. keys in the table is equally likely 
will have an average running time of 


, ,, , (1 + * • • +Tii ) + ( 1-1 bn.2) • • + ( 1 -t bn m ) 

A(h] , . . . , hn) = 

n 

I'M (rii +1 ) + n2(n.2+l ) + ■••+ n m (n m -b1 ) 

2n 

n] + n 2 H b n^i + n 

In 

probes. Our goal is to calculate the variance of this quantity A (hi , . . . , h n ), 
over the probability space consisting of all m n sequences (hi , . . . , h n ). 

The calculations will be simpler, it turns out, if we compute the variance 
of a slightly different quantity, 


B(hi,...,h n ) = 



We have 


A(hi , . . . ,hn) = 1 -bB(hi,...,h n )/n, 
hence the mean and variance of A satisfy 


EA = 1 + 


EB 


n 


VA = 


VB 


rP 


(8.100) 


But the VP is 
nonzero only in an 
election year. 
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The probability that the list sizes will be n.j , n 2 , . . . , n m is the multinomial 
coefficient 


n 


m ,n 2 , . . . ,n m 


n! 


iT-i ! n 2 ! . . . n m ! 


divided by m n ; hence the pgf for B(hi , . . . , h n ) is 


z = 


L 


ill ,n 2 ,...,n m ^0 
n, +n 2 H hn m =n 


n 


ni,n 2) . 


,n„ 


(7)+(7)+...+r 


z' 2 


m 


This sum looks a bit scary to inexperienced eyes, but our experiences in 
Chapter 7 have taught us to recognize it as an m-fold convolution. Indeed, if 
we consider the exponential super-generating function 


G(w,z) = Y B n (z) 


n>0 


n! 


we can readily verify that G(w,z) is simply an mth power: 


G(w,z) = ( Y 


k>0 


w 

Id 


As a check, we can try setting z = 1 ; we get G (w, 1 ) = (e w ) m , so the coefficient 
of m n w n /i'U is B n (1) = 1. 

If we knew the values of B^(1 ) and B"(1 ), we would be able to calculate 
Var(B n ). So we take partial derivatives of G(w,z) with respect to z: 


^G(w,z) 


dz 2 


G(w, 


z) 


n>0 


mw 

nl 


m 


x k^0 ' 7 k^O v 7 


Y B r 

n>0 


4 mw 


n 


mm 


-'HU 


W 


k \ tti-2 


Z^ z > 


k^O 


k! 


L 

k^O 


k\ W_ 

k! 


k\2 


k^O 


+m (I> 0) vr) L 


k\™— 1 


k>0 


-iU ^- 2 — . 

k! 
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Complicated, yes; but everything simplifies greatly when we set z = 1 . For 
example, we have 




= me (m 


n> 0 


ru 


i )w y~ 


w 


k>2 


2(k — 2)1 


= me 


(m- 


1 )w 


W 


k+2 


kSO 


2k! 


mw 2 e (m-l) W 


= L 


Imw 


in+2 


n^O 


2mn! 


= L 


n(n— 1 )m n w T1 


n^O 


2mn! 


and it follows that 

b:mi = CU 


2 / m 


(8.101) 


The expression for EA in (8. 100) now gives EA = 1 + (n— 1 )/ 2 m, in agreement 
with (8.97). 

The formula for B"( 1 ) involves the similar sum 


L 

k>0 


k\ \w k _ 1 y- (k + 1 )k(k — 1 )(k — 2)w k 
_1 id" " 4 2- 


k>0 


k! 


1 y- (k+l)w k _ 1 y- 

4 IV _ 211 ~ A 2 — 


4 (k — 3 )! 

k >3 ' ; 


k>0 


(k + 4 )w k+3 

Id 


= ( |w 4 + w 3 )e w • 


hence we find that 

, m n w' 

»nl' 

n>0 




n! 


= m(m— 1 )e w(m - 2) (lw 2 e w ) 2 +rne w(m - 1 >(lw 4 +w 3 )e v 
= me wra (Jmw 4 + w 3 ) ; 


B"(D = 


- 1 


1 


m- 


2 • 


(8.102) 


Now we can put all the pieces together and evaluate the desired variance VA. 
Massive cancellation occurs, and the result is surprisingly simple: 


va = — = B n(i) + B ;(i)^B;n) 2 


rP 


rP 


n(n — 1 ) / (n + 1 )(n — 2) m n(n— 1) 


(m — 1 )(n — 1 


2m 2 n 


(8.103) 
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When such “coincidences” occur, we suspect that there’s a mathematical 
reason; there might be another way to attack the problem, explaining why 
the answer has such a simple form. And indeed, there is another approach (in 
exercise 61), which shows that the variance of the average successful search 
has the general form 

VA = ^ Y_ s k( k - T) (8.104) 


Where have 1 seen 
that pattern before? 

Where have I seen 
that graffito before? 

ItTvPvt . 


when Sk is the probability that the kth-inserted element is being sought. 
Equation (8.103) is the special case Sk = 1/n for 1 ^ k n. 

Besides the variance of the average, we might also consider the average of 
the variance. In other words, each sequence (hi , . . . , h n ) that defines a hash 
table also defines a probability distribution for successful searching, and the 
variance of this probability distribution tells how spread out the number of 
probes will be in different successful searches. For example, let’s go back to 
the case where we inserted n = 1 6 things into m = 1 0 lists: 

(h 1( ...,h 16 ) = 3141592653589793 
(P,,. . . ,P 16 ) = 1112111122312133 

A successful search in the resulting hash table has the pgf 


16 


G (3, 1,4, 1, ... ,3) = X! 


s k z 


P(3,1 ,4,1 ,...,3;k) 


k=1 


We have just considered the average number of probes in a successful search 
of this table, namely A(3, 1 ,4, 1 , . . . , 3) = Mean(G(3, 1 , 4, 1 , . . . , 3)). We can 
also consider the variance, 

Si'1 _ + S2'1 - + S3-1 _ + S4'2^ + -- - + Sl6-3 2 

— ( S 1 • 1 + S 2 • 1 + S 3 • 1 +S 4-2 + -- - + Si6'3)^. 

This variance is a random variable, depending on (hi , . . . , h n ), so it is natural 
to consider its average value. 

In other words, there are three natural kinds of variance that we may 
wish to know, in order to understand the behavior of a successful search: The 
overall variance of the number of probes, taken over all (hi , . . . , h n ) and k; 
the variance of the average number of probes, where the average is taken 
over all k and the variance is then taken over all (hi , . . . , h n ); and the average 
of the variance of the number of the probes, where the variance is taken over 
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all k and the average is then taken over all (Hi , . . . ,h Tl ). In symbols, the 
overall variance is 


Vp = L X m^ Plhl h " ;k|1 

1 k=1 


1 ^hi k=1 


the variance of the average is 


VA = ^ — ( 21 s kP(Hi,...,H n ;k) 


1 <CHi , . . M h. n ^m 


k=1 


v m 

v 1 ^ h. i , . . . , Kn. m k= 1 

and the average of the variance is 


21 — 21skP(hi,...,h n ;k) ; 


AV = 


L 


m" 


21 s k P(H 1 ,...,h n ;k) 2 


1 ^h-i \k=1 


~ ( Y s k P(H 1 ,...,h n ;k) 


k=1 


It turns out that these three quantities are interrelated in a simple way: 
VP = VA + AV. (8.105) 


In fact, conditional probability distributions always satisfy the identity 

VX = V(E(X|Y)) +E(V(X|Y)) (8.106) 

if X and Y are random variables in any probability space and if X takes real 
values. (This identity is proved in exercise 22 .) Equation (8.105) is the 
special case where X is the number of probes in a successful search and Y is 
the sequence of hash values (hi , . . . , H n ). 

The general equation (8.106) needs to be understood carefully, because 
the notation tends to conceal the different random variables and probability 
spaces in which expectations and variances are being calculated. For each y 
in the range of Y, we have defined the random variable X|y in (8.91), and this 
random variable has an expected value E(X|y) depending on y. Now E(X|Y) 
denotes the random variable whose values are E(X|y) as y ranges over all 



8.5 HASHING 425 


(Now is a good 
time to do warmup 
exercise 6 .) 


P is still the num- 
ber of probes. 


possible values of Y, and V(E(X|Y)) is the variance of this random variable 
with respect to the probability distribution of Y. Similarly, E(V(X|Y)) is the 
average of the random variables V(X|y) as y varies. On the left of (8.106) 
is VX, the unconditional variance of X. Since variances are nonnegative, we 
always have 

VX ^ V(E(X|Y)) and VX ^ E(V(X|Y)) . (8.107) 

Case 1, again: Unsuccessful search revisited. 

Let’s bring our microscopic examination of hashing to a close by doing one 
more calculation typical of algorithmic analysis. This time we’ll look more 
closely at the total running time associated with an unsuccessful search, 
assuming that the computer will insert the previously unknown key into its 
memory. 

The insertion process in (8.83) has two cases, depending on whether ) is 
negative or zero. We have ) < 0 if and only if P = 0, since a negative value 
comes from the FIRST entry of an empty list. Thus, if the list was previously 
empty, we have P = 0 and we must set FIRST [h^i] := tl+ 1. (The new 
record will be inserted into position n + 1 .) Otherwise we have P > 0 and we 
must set a LINK entry to n + 1 . These two cases may take different amounts 
of time; therefore the total running time for an unsuccessful search has the 
form 


T = a+|3P + 6[P = 0], (8.108) 

where oc, (3, and 6 are constants that depend on the computer being used and 
on the way in which hashing is encoded in that machine’s internal language. 
It would be nice to know the mean and variance of T, since such information 
is more relevant in practice than the mean and variance of P. 

So far we have used probability generating functions only in connection 
with random variables that take nonnegative integer values. But it turns out 
that we can deal in essentially the same way with 

G x (z) = Y- 

a>en 

when X is any real- valued random variable, because the essential characteris- 
tics of X depend only on the behavior of Gx near z — 1 , where powers of z are 
well defined. For example, the running time (8.108) of an unsuccessful search 
is a random variable, defined on the probability space of equally likely hash 
values (hi , . . . , h n , h n+ i ) with 1 ^ hj ^ m; we can consider the series 

m mm 

Gy(z) = — — — — y . . . y y z a+0P(Hi,...,H n+1 )+6[P(Hi,...,H n + 1 )=O] 

m n+ 1 / / / 

Hi =1 H n = 1H n+ i=1 
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to be a pgf even when a, (3, and 6 are not integers. (In fact, the parameters 
a, (3 , 6 are physical quantities that have dimensions of time; they aren’t even 
pure numbers! Yet we can use them in the exponent of z.) We can still 
calculate the mean and variance of T, by evaluating Gy(1) and Gj(l) and 
combining these values in the usual way. 

The generating function for P instead of T is 


P(z) = 


i = r- 

V m / 


Pr(P =p)z p . 

p^O 


Therefore we have 


G t (z) = Y_ Pr(P=p)z a+|3p+5[p = 01 

p^O 


= z“((z 5 - l)Pr(P = 0) + Y_ Pr(P=p)z Pp ) 

p^O 



The determination of Mean (Gy) and Var(GT) is now routine: 


G£(1) = 


m 

Tl 

m 


a(a — 1 ) + 2a(3 — + (3 ( (3 — 1 ) — + (3 2 


TL / TTL — 1 \TL 

Mean(G T ) = G4(l) = ct+|3— + 5( ; 

m V m / 

n 

m 


( 8 . 109 ) 


n(n — 1 
m 2 

m- 1 \ n 


+2a6 (^)% 6(6 _ 1)( ^)" ; 

Var(G T ) = G;'(l) + G(-( 1 )-G;n ) 2 

2 Ti(in,— 1) „_/m — 1 \ n n 


= p 


m z 


+*{<?£■)' -(rtrT)- <-”> 


In Chapter 9 we will learn how to estimate quantities like this when 
m and n are large. If, for example, m = n and n — > 00 , the techniques 
of Chapter 9 will show that the mean and variance of T are respectively 
a + |3 + 6 e~’ + 0(n _1 ) and |3 2 — 2|36e~’ + 6 2 (e _1 — e~ 2 ) + 0(n _1 ). If 
m = n/lnn and n — > 00 the corresponding results are 


Mean(Gj) = (3 Inn + a + 6 /n + 0 ((logn) 2 /n 2 ) ; 

Var(G-r) = |3 2 Inn — ((|3 Inn ) 2 + 2|361nn — 6 2 )/n + 0 ((logn) 3 /n 2 ) . 
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Why only ten 
numbers? 

The other students 
either weren’t 
empiricists or 
they were just too 
dipped out. 


Exercises 

Warmups 

1 What’s the probability of doubles in the probability distribution Proi 
of ( 8 . 3 ), when one die is fair and the other is loaded? What’s the proba- 
bility that S = 7 is rolled? 

2 What’s the probability that the top and bottom cards of a randomly shuf- 
fled deck are both aces? (All 52! permutations have probability 1 /52! .) 

3 Stanford’s Concrete Math students were asked in 1979 to flip coins until 
they got heads twice in succession, and to report the number of flips 
required. The answers were 

3,2,3,5,10,2, 6, 6, 9,2. 

Princeton’s Concrete Math students were asked in 1987 to do a similar 
thing, with the following results: 

10, 2, 10, 7, 5, 2, 10, 6, 10, 2. 

Estimate the mean and variance, based on (a) the Stanford sample; 
(b) the Princeton sample. 

4 Let H(z) = F(z)/G(z), where F( 1 )= G(1 ) = 1 . Prove that 

Mean(H) = Mean(F) — Mean(G) , 

Var(H) = Var(F) — Var(G) , 

in analogy with ( 8 . 38 ) and ( 8 . 39 ), if the indicated derivatives exist at 
z = 1 . 

5 Suppose Alice and Bill play the game ( 8 . 78 ) with a biased coin that comes 
up heads with probability p. Is there a value of p for which the game 
becomes fair? 

6 What does the conditional variance law ( 8 . 106 ) reduce to, when X and Y 
are independent random variables? 

Basics 

7 Show that if two dice are loaded with the same probability distribution, 
the probability of doubles is always at least i. 

8 Let A and B be events such that A U B = Cl. Prove that 

Pr(tueAnB) = Pr(cu G A) Pr(cu G B) - Pr(cu ^ A) Pr(cu ^ B) . 

9 Prove or disprove: If X and Y are independent random variables, then so 
are F(X) and G(Y), when F and G are any functions. 
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10 What’s the maximum number of elements that can be medians of a ran- 
dom variable X, according to definition ( 8 . 7 )? 

11 Construct a random variable that has finite mean and infinite variance. 

12 a If P(z) is the pgf for the random variable X, prove that 

Pr(XsCr) Y x~ r P(x) for0<x^1; 

Pr(X^r) Y x~ r P(x) forx^l. 


(These important relations are called the tail inequalities .) 
b In the special case P(z) = (1 + z) n /2 n , use the first tail inequality to 
prove that 2 Ik$an (k) ^ 1 /oc “ n (1 — a) ll_ “ )n when 0 < a< 

13 If X] , , X 2 n are independent random variables with the same distri- 

bution, and if oc is any real number whatsoever, prove that 


Pr 


Xl + • • • + X2n 

2 n 



Xi + ■ ■ ■ + x n 

n 



> 


1 

2 ' 


14 Let F(z) and G(z) be probability generating functions, and let 


H(z) = p F(z) + q G(z) 


where p + q = 1. (This is called a mixture of F and G; it corresponds to 
flipping a coin and choosing probability distribution F or G depending on 
whether the coin comes up heads or tails.) Find the mean and variance 
of H in terms of p, q, and the mean and variance of F and G. 

15 If F(z) and G(z) are probability generating functions, we can define an- 
other pgf H(z) by “composition”: 

H(z) = F(G(z)). 

Express Mean(H) and Var(H) in terms of Mean(F), Var(F), Mean(G), 
and Var(G). (Equation ( 8 . 93 ) is a special case.) 

16 Find a closed form for the super generating function X.n>o Fn(z)w n , 
when F n (z) is the football-fixation generating function defined in ( 8 . 53 ). 

17 Let X np and Y np have the binomial and negative binomial distributions, 
respectively, with parameters (n,p). (These distributions are defined in 
( 8 . 57 ) and ( 8 . 60 ).) Prove that Pr(Y niP ^m) = Pr(X m+niP ^n). What 
identity in binomial coefficients does this imply? 

18 A random variable X is said to have the Poisson distribution with 
mean p if Pr(X = k) = e^p k /k! for all k ^ 0. 

a What is the pgf of such a random variable? 
b What are its mean, variance, and other cumulants? 


The distribution of 
fish per unit volume 
of water. 
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19 Continuing the previous exercise, let Xi be a random Poisson variable 
with mean pi , and let Xj be a random Poisson variable with mean p2i 
independent of Xi . 

a What is the probability that Xi + X2 = n? 

b What are the mean, variance, and other cumulants of 2 Xj + 3X2? 

20 Prove (8.74) and (8.75), the general formulas for mean and variance of 
the time needed to wait for a given pattern of heads and tails. 

21 What does the value of N represent, if H and T are both set equal to 4 
in (8.77)? 

22 Prove (8.106), the law of conditional expectations and variances. 
Homework exercises 

23 Let Proo be the probability distribution of two fair dice, and let Pri 1 be 
the probability distribution of two loaded dice as given in (8.2). Find all 
events A such that Proo (A) = Prii (A). Which of these events depend 
only on the random variable S? (A probability space with £1 = D 2 has 
2 36 events; only 2 1 1 of those events depend on S alone.) 

24 Player J rolls 2 n + 1 fair dice and removes those that come up [Tj] . Player 
K then calls a number between 1 and 6, rolls the remaining dice, and 
removes those that show the number called. This process is repeated 
until no dice remain. The player who has removed the most total dice 
(n + 1 or more) is the winner. 

a What are the mean and variance of the total number of dice that 
) removes? Hint: The dice are independent, 
b What’s the probability that J wins, when n = 2 ? 

25 Consider a gambling game in which you stake a given amount A and you 
roll a fair die. If k spots turn up, you multiply your stake by 2 (k — 1 ) / 5 . 
(In particular, you double the stake whenever you roll [• •[, but you lose 
everything if you roll [ 7 ].) You can stop at any time and reclaim the 
current stake. What are the mean and variance of your stake after n rolls? 
(Ignore any effects of rounding to integer amounts of currency.) 

26 Find the mean and variance of the number of l-cycles in a random permu- 
tation of n elements. (The football victory problem discussed in (8.23), 
(8.24), and (8.53) is the special case l = I.) 

27 Let X], X2, . . . , X n be independent samples of the random variable X. 
Equations (8.19) and (8.20) explain how to estimate the mean and vari- 
ance of X on the basis of these observations; give an analogous formula 
for estimating the third cumulant K3. (Your formula should be an “un- 
biased” estimate, in the sense that its expected value should be K3.) 
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28 What is the average length of the coin-flipping game (8.78), 

a given that Alice wins? 

b given that Bill wins? 

29 Alice, Bill, and Computer flip a fair coin until one of the respective 
patterns A = HHTH, B = HTHH, or C = THHH appears for the first time. 
(If only two of these patterns were involved, we know from (8.82) that A 
would probably beat B, that B would probably beat C, and that C would 
probably beat A; but all three patterns are simultaneously in the game.) 
What are each player’s chances of winning? 

30 The text considers three kinds of variances associated with successful 
search in a hash table. Actually there are two more: We can consider the 
average (over k) of the variances (over h] , . . . , h n ) of P(hi , . . . , h n ; k); 
and we can consider the variance (over k) of the averages (over hi , 
. . . , h n ). Evaluate these quantities. 

31 An apple is located at vertex A of pentagon ABCDE, and a worm is 
located two vertices away, at C. Every day the worm crawls with equal 
probability to one of the two adjacent vertices. Thus after one day the 
worm is at vertex B or vertex D, each with probability j. After two 
days, the worm might be back at C again, because it has no memory of 
previous positions. When it reaches vertex A, it stops to dine. 

a What are the mean and variance of the number of days until dinner? 
b Let p be the probability that the number of days is 100 or more. 

What does Chebyshev’s inequality say about p? 
c What do the tail inequalities (exercise 12) tell us about p? 

32 Alice and Bill are in the military, stationed in one of the five states 
Kansas, Nebraska, Missouri, Oklahoma, or Colorado. Initially Alice is in 
Nebraska and Bill is in Oklahoma. Every month each person is reassigned 
to an adjacent state, each adjacent state being equally likely. (Here’s a 
diagram of the adjacencies: 



The initial states are circled.) For example, Alice is restationed after the 
first month to Colorado, Kansas, or Missouri, each with probability 1 /3. 
Find the mean and variance of the number of months it takes Alice and 
Bill to find each other. (You may wish to enlist a computer’s help.) 


Schrodinger’s worm. 


Definitely a finite- 
state situation. 
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(Use a calculator for 
the numerical work 
on this problem.) 


33 Are the random variables Xi and X? in (8.89) independent? 

34 Gina is a golfer who has probability p = .05 on each stroke of making a 
“supershot” that gains a stroke over par, probability q = .91 of making 
an ordinary shot, and probability r = .04 of making a “subshot” that 
costs her a stroke with respect to par. (Non-golfers: At each turn she 
advances 2, 1, or 0 steps toward her goal, with probability p, q, or r, 
respectively. On a par-m hole, her score is the minimum n such that she 
has advanced m or more steps after taking n turns. A low score is better 
than a high score.) 

a Show that Gina wins a par-4 hole more often than she loses, when 
she plays against a player who shoots par. (In other words, the 
probability that her score is less than 4 is greater than the probability 
that her score is greater than 4.) 

b Show that her average score on a par-4 hole is greater than 4. (There- 
fore she tends to lose against a “steady” player on total points, al- 
though she would tend to win in match play by holes.) 


Exam problems 


35 A die has been loaded with the probability distribution 


Pr(0) = Pi ; Pr(EZD = P 2 ; •••; Pr ( [Tj] ) = P6 . 

Let S n be the sum of the spots after this die has been rolled n times. Find 
a necessary and sufficient condition on the “loading distribution” such 
that the two random variables S n mod 2 and S n mod 3 are independent 
of each other, for all n. 

36 The six faces of a certain die contain the spot patterns 

□ 0 mi m 

instead of the usual [7] through [Q] . 

a Show that there is a way to assign spots to the six faces of another 
die so that, when these two dice are thrown, the sum of spots has the 
same probability distribution as the sum of spots on two ordinary 
dice. (Assume that all 36 face pairs are equally likely.) 
b Generalizing, find all ways to assign spots to the 6n faces of n dice so 
that the distribution of spot sums will be the same as the distribution 
of spot sums on n ordinary dice. (Each face should receive a positive 
integer number of spots.) 

37 Let p n be the probability that exactly n tosses of a fair coin are needed 
before heads are seen twice in a row, and let q n = k>n Pk- Find closed 
forms for both p n and q n in terms of Fibonacci numbers. 
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38 What is the probability generating function for the number of times you 
need to roll a fair die until all six faces have turned up? Generalize to 
m-sided fair dice: Give closed forms for the mean and variance of the 
number of rolls needed to see l of the m faces. What is the probability 
that this number will be exactly n? 

39 A Dirichlet probability generating function has the form 


P(z) 


L 


Pn 

n z 


Thus P(0) = 1. If X is a random variable with Pr(X = n) = p n , express 
E(X), V(X), and E(lnX) in terms of P(z) and its derivatives. 

40 The mth cumulant K m of the binomial distribution (8.57) has the form 
nfm(p), where f m is a polynomial of degree m. (For example, fi (p) = p 
and f2 (p) = p — p 2 , because the mean and variance are np and upq.) 

a Find a closed form for the coefficient of p k in f m (p). 
b Prove that f m (j) = (2 m — l)B m /m+ [m= 1], where B m is the mth 
Bernoulli number. 

41 Let the random variable X n be the number of flips of a fair coin un- 
til heads have turned up a total of n times. Show that E(X T ^^ 1 ) = 
(— l) n (ln2 + H^ n /2J — H n ). Use the methods of Chapter 9 to estimate 
this value with an absolute error of 0(n~ 3 ). 

42 A certain man has a problem finding work. If he is unemployed on 
any given morning, there’s constant probability pn (independent of past 
history) that he will be hired before that evening; but if he’s got a job 
when the day begins, there’s constant probability pf that he’ll be laid 
off by nightfall. Find the average number of evenings on which he will 
have a job lined up, assuming that he is initially employed and that this 
process goes on for n days. (For example, if n = 1 the answer is 1 — pf.) 

43 Find a closed form for the pgf G n (z) = £) k>0 Pk,nZ- k , where Pk,n is the 
probability that a random permutation of n objects has exactly k cycles. 
What are the mean and standard deviation of the number of cycles? 

44 The athletic department runs an intramural “knockout tournament” for 
2 n tennis players as follows. In the first round, the players are paired off 
randomly, with each pairing equally likely, and 2 n_1 matches are played. 
The winners advance to the second round, where the same process pro- 
duces 2 n ~ 2 winners. And so on; the kth round has 2 n ~ k randomly chosen 
matches between the 2 n ~ k+1 players who are still undefeated. The nth 
round produces the champion. Unbeknownst to the tournament organiz- 
ers, there is actually an ordering among the players, so that xi is best, X2 


Does TgK choose 
optimal line breaks? 
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A peculiar set of 
tennis players. 


“A fast arithmetic 
computation shows 
that the sherry is 
always at least three 
years old. Taking 
computation further 
gives the vertigo.” 

— Revue du vin de 
France (Nov 1984) 


is second best, . . . , X 2 ^ is worst. When Xj plays X]< and ) < k, the winner 
is Xj with probability p and x^ with probability 1 — p, independent of 
the other matches. We assume that the same probability p applies to all 
j and k. 

a What’s the probability that xi wins the tournament? 
b What’s the probability that the nth round (the final match) is be- 
tween the top two players, X] and X 2 ? 
c What’s the probability that the best 2 k players are the competitors 
in the kth-to-last round? (The previous questions were the cases 
k = 0 and k = 1.) 

d Let N(n) be the number of essentially different tournament results; 
two tournaments are essentially the same if the matches take place 
between the same players and have the same winners. Prove that 
N(n) =2 n !. 

e What’s the probability that X 2 wins the tournament? 
f Prove that if j < p < 1, the probability that Xj wins is strictly 
greater than the probability that Xj+i wins, for 1 < j < 2 n . 

45 True sherry is made in Spain according to a multistage system called 
“Solera.” For simplicity we’ll assume that the winemaker has only three 
barrels, called A, B, and C. Every year a third of the wine from barrel C 
is bottled and replaced by wine from B; then B is topped off with a third 
of the wine from A; finally A is topped off with new wine. Let A(z), B(z), 
C(z) be probability generating functions, where the coefficient of z n is 
the fraction of n-year-old wine in the corresponding barrel just after the 
transfers have been made. 

a Assume that the operation has been going on since time immemorial, 
so that we have a steady state in which A(z), B (z), and C(z) are the 
same at the beginning of each year. Find closed forms for these 
generating functions. 

b Find the mean and standard deviation of the age of the wine in each 
barrel, under the same assumptions. What is the average age of the 
sherry when it is bottled? How much of it is exactly 25 years old? 
c Now take the finiteness of time into account: Suppose that all three 
barrels contained new wine at the beginning of year 0. What is the 
average age of the sherry that is bottled at the beginning of year n? 

46 Stefan Banach used to carry two boxes of matches, each containing 
n matches initially. Whenever he needed a light he chose a box at ran- 
dom, each with probability j , independent of his previous choices. After 
taking out a match he’d put the box back in its pocket (even if the box 
became empty — all famous mathematicians used to do this). When his 
chosen box was empty he’d throw it away and reach for the other box. 
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a Once he found that the other box was empty too. What’s the prob- 
ability that this occurs? (For n = 1 it happens half the time and 
for n = 2 it happens 3/8 of the time.) To answer this part, find a 
closed form for the generating function P(w, z) = n p m n w m z n , 
where p mi n is the probability that, starting with m matches in one 
box and n in the other, both boxes are empty when an empty box 
is first chosen. Then find a closed form for p n ,n- 
b Generalizing your answer to part (a), find a closed form for the 
probability that exactly k matches are in the other box when an 
empty one is first thrown away. 

c Find a closed form for the average number of matches in that other 
box. 

47 Some physicians, collaborating with some physicists, recently discovered 
a pair of microbes that reproduce in a peculiar way. The male microbe, 
called a diphage, has two receptors on its surface; the female microbe, 
called a triphage, has three: 


diphage: 



triphage: 



receptor: □ 


When a culture of diphages and triphages is irradiated with a psi-particle, 
exactly one of the receptors on one of the phages absorbs the particle; 
each receptor is equally likely. If it was a diphage receptor, that diphage 
changes to a triphage; if it was a triphage receptor, that triphage splits 
into two diphages. Thus if an experiment starts with one diphage, the 
first psi-particle changes it to a triphage, the second particle splits the 
triphage into two diphages, and the third particle changes one of the 
diphages to a triphage. The fourth particle hits either the diphage or 
the triphage; then there are either two triphages (probability |) or three 
diphages (probability |). Find a closed form for the average number 
of diphages present, if we begin with a single diphage and irradiate the 
culture n times with single psi-particles. 

48 Five people stand at the vertices of a pentagon, throwing frisbees to each 
other. 


And for the number 
in the empty box. 


Or, if this pentagon 
is in Arlington, 
throwing missiles 
at each other. 
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Frisbee is a trade- 
mark of Wham-0 
Manufacturing 
Company. 


They have two frisbees, initially at adjacent vertices as shown. In each 
time interval, each frisbee is thrown either to the left or to the right 
(along an edge of the pentagon) with equal probability. This process 
continues until one person is the target of two frisbees simultaneously; 
then the game stops. (All throws are independent of past history.) 
a Find the mean and variance of the number of pairs of throws, 
b Find a closed form for the probability that the game lasts more than 
100 steps, in terms of Fibonacci numbers. 

49 Luke Snowwalker spends winter vacations at his mountain cabin. The 
front porch has m pairs of boots and the back porch has n pairs. Every 
time he goes for a walk he flips a (fair) coin to decide whether to leave 
from the front porch or the back, and he puts on a pair of boots at that 
porch and heads off. There’s a 50/50 chance that he returns to each 
porch, independent of his starting point, and he leaves the boots at the 
porch he returns to. Thus after one walk there will be m + [— 1 , 0, or +1] 
pairs on the front porch and n — [— 1 , 0, or +1] pairs on the back porch. 
If all the boots pile up on one porch and if he decides to leave from 
the other, he goes without boots and gets frostbite, ending his vacation. 
Assuming that he continues his walks until the bitter end, let Pm (m, n) be 
the probability that he completes exactly N nonfrostbitten trips, starting 
with m pairs on the front porch and n on the back. Thus, if both m 
and n are positive, 


P N (m,n) = lp N -i (m- 1,n+ 1) + lp N _i (m,n) 

+ |Pn-i (m+ l,n— 1) ; 

this follows because this first trip is either front /back, front /front, back/ 
back, or back/front, each with probability and N — 1 trips remain, 
a Complete the recurrence for Pm (m, n) by finding formulas that hold 
when m = 0 or n = 0. Use the recurrence to obtain equations that 
hold among the probability generating functions 

g m>n (z) = Y PN(m,n)z N . 

N^O 

b Differentiate your equations and set z = 1 , thereby obtaining rela- 
tions among the quantities n (1). Solve these equations, thereby 
determining the mean number of trips before frostbite, 
c Show that g mi n has a closed form if we substitute z = 1 /cos 2 0: 

( 1 \ sin(2m + 1 )0 + sin(2u + 1 )0 

9m, n TA = COS0. 

\cos z 0/ sm(2m + 2n + 2)0 
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50 Consider the function 

H(Z) = 1 +^( z - 3 + Vfl 


The purpose of this problem is to prove that H(z) = X!k>o^ zk a 
probability generating function, and to obtain some basic facts about it. 
a Let (1 — z) 3 / 2 (9 — z) 1 / 2 = XLk>o c k zk - Prove that Co = 3, Ci = 
-14/3, c 2 = 37/27, and c 3+ i = 3£ k (£) (|) k+3 for all l /> 0. 

Hint: Use the identity 

(9-z) 1 / 2 = 3(1 -z) 1 / 2 (l + fz/(l -z))’/ 2 


and expand the last factor in powers of z/(l — z). 
b Use part (a) and exercise 5.81 to show that the coefficients of H(z) 
are all positive. 

c Prove the amazing identity 


9-H(z) 
1 - H(z) 



+ 2 . 


d What are the mean and variance of H? 


51 The state lottery in El Dorado uses the payoff distribution H defined 
in the previous problem. Each lottery ticket costs 1 doubloon, and the 
payoff is k doubloons with probability h^. Your chance of winning with 
each ticket is completely independent of your chance with other tickets; 
in other words, winning or losing with one ticket does not affect your 
probability of winning with any other ticket you might have purchased 
in the same lottery. 

a Suppose you start with one doubloon and play this game. If you win 
k doubloons, you buy k tickets in the second game; then you take 
the total winnings in the second game and apply all of them to the 
third; and so on. If none of your tickets is a winner, you’re broke 
and you have to stop gambling. Prove that the pgf of your current 
holdings after n rounds of such play is 

1 4 ^ 4 

\/(9-z)/(1 “ z ) + 2n - 1 + \/ (9 — z)/(l - z) + 2n + 1 

b Let g n be the probability that you lose all your money for the first 
time on the nth game, and let G(z) = giz + g 3 z 2 + • • • . Prove 
that G ( 1 ) = 1 . (This means that you’re bound to lose sooner or 
later, with probability 1, although you might have fun playing in 
the meantime.) What are the mean and the variance of G? 
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A doubledoubloon. 


c What is the average total number of tickets you buy, if you continue 
to play until going broke? 

d What is the average number of games until you lose everything if 
you start with two doubloons instead of just one? 

Bonus problems 

52 Show that the text’s definitions of median and mode for random variables 
correspond in some meaningful sense to the definitions of median and 
mode for sequences, when the probability space is finite. 

53 Prove or disprove: If X, Y, and Z are random variables with the property 
that all three pairs (X, Y), (X, Z) and (Y, Z) are independent, then X + Y 
is independent of Z. 

54 Equation (8.20) proves that the average value of VX is VX. What is the 
variance of VX? 

55 A normal deck of playing cards contains 52 cards, four each with face 
values in the set {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K}. Let X and Y denote 
the respective face values of the top and bottom cards, and consider the 
following algorithm for shuffling: 

51 Permute the deck randomly so that each arrangement occurs with 
probability 1/52!. 

52 If X 7^ Y, flip a biased coin that comes up heads with probability p, 
and go back to step SI if heads turns up. Otherwise stop. 

Each coin flip and each permutation is assumed to be independent of all 
the other randomizations. What value of p will make X and Y indepen- 
dent random variables after this procedure stops? 

56 Generalize the frisbee problem of exercise 48 from a pentagon to an 
m-gon. What are the mean and variance of the number of collision-free 
throws in general, when the frisbees are initially at adjacent vertices? 
Show that, if m is odd, the pgf for the number of throws can be written 
as a product of coin-flipping distributions: 


(m-l)/2 


G m (z) = 


n 


PkZ 

1 - q k z ’ 


where p k = sin 


k=1 

2 (2k- 1)7t 


2m 


q k = cos 


2 (2k- 1 )tt 
2m 


Hint: Try the substitution z = 1 /cos 2 0. 

57 Prove that the Penney-ante pattern T1T2 . . . is always inferior to 

the pattern T2T1 t 2 . . . Ti_i when a fair coin is flipped, if l ^ 3. 
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58 Is there any sequence A = T1T2 . . . Ti 1 Tl of l ^ 3 heads and tails such 
that the sequences Hti T 2 . . . and Tti T 2 . . . Ti_i both perform equally 
well against A in the game of Penney ante? 

59 Are there patterns A and B of heads and tails such that A is longer 
than B, yet A appears before B more than half the time when a fair coin 
is being flipped? 

60 Let k and n be fixed positive integers with k < n. 

a Find a closed form for the probability generating function 

m m 

G(w,z) = — - Y_ ■■■ Y- wP(Hl hn;k) z P(hl ’-’ Hn;u) 

m hr=i Hn=i 

for the joint distribution of the numbers of probes needed to find the 
kth and nth items that have been inserted into a hash table with 
m lists. 

b Although the random variables P(h_i , . . . , h n ; k) and P(h.j , . . . , h n ; n) 
are dependent, show that they are somewhat independent: 

E(P(H 1 ,...,h n ;k)P(h 1 ,...,H n ;n)) 

= (EP(h 1 ,...,h n ;k))(EP(h 1 ,...,h n ;n)). 

61 Use the result of the previous exercise to prove (8.104). 

62 Continuing exercise 47, find the variance of the number of diphages after 
n irradiations. 

Research problem 

63 The normal distribution is a non-discrete probability distribution char- 
acterized by having all its cumulants zero except the mean and the vari- 
ance. Is there an easy way to tell if a given sequence of cumulants 
(ki , K 2 , K3 , . . . ) comes from a discrete distribution? (All the probabilities 
must be “atomic” in a discrete distribution.) 



Uh oh . . . here 
conies that A-word. 


I 



Asymptotics 


EXACT ANSWERS axe great when we can find them; there’s something 
very satisfying about complete knowledge. But there’s also a time when 
approximations are in order. If we run into a sum or a recurrence whose 
solution doesn’t have a closed form (as far as we can tell), we still would like 
to know something about the answer; we don’t have to insist on all or nothing. 
And even if we do have a closed form, our knowledge might be imperfect, since 
we might not know how to compare it with other closed forms. 

For example, there is (apparently) no closed form for the sum 



But it is nice to know that 



we say that the sum is “asymptotic to” 2( 3 j ( 1 ). It’s even nicer to have more 
detailed information, like 



which gives us a “relative error of order 1/n 2 .” But even this isn’t enough to 
tell us how big S n is, compared with other quantities. Which is larger, S n or 
the Fibonacci number F4 n ? Answer: We have S2 = 22 > Fg = 21 when n. = 2; 
but F 4 n is eventually larger, because F4 n ~ c|) 4n /v / 5 and 4> 4 ~ 6.8541, while 


s - = & 5 »"('-^ + 0 (i))- 

Our goal in this chapter is to learn how to understand and to derive results 
like this without great pain. 
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The word asymptotic stems from a Greek root meaning “not falling 
together.” When ancient Greek mathematicians studied conic sections, they 
considered hyperbolas like the graph of y = V 1 + x 2 , 



which has the lines y = x and p = — x as “asymptotes.” The curve approaches 
but never quite touches these asymptotes, when x — » oo. Nowadays we use 
“asymptotic” in a broader sense to mean any approximate value that gets 
closer and closer to the truth, when some parameter approaches a limiting 
value. For us, asymptotics means “almost falling together.” 

Some asymptotic formulas are very difficult to derive, well beyond the 
scope of this book. We will content ourselves with an introduction to the sub- 
ject; we hope to acquire a suitable foundation on which further techniques can 
be built. We will be particularly interested in understanding the definitions 
of and ‘O’ and similar symbols, and we’ll study basic ways to manipulate 
asymptotic quantities. 

9.1 A HIERARCHY 

Functions of n that occur in practice usually have different “asymp- 
totic growth ratios”; one of them will approach infinity faster than another. 
We formalize this by saying that 

f(n) -< g(n) <t=i> lim = 0 . ( 9 . 3 ) 

rwoo g(n) 

This relation is transitive: If f(n) -< g(n) and g(n) -< h(n) then f(n) -< h.(n). 
We also may write g(n) >- f(n) if f(n) -< g(n). This notation was introduced 
in 1871 by Paul du Bois-Reymond [85]. 

For example, n -< n 2 ; informally we say that u. grows more slowly 
than n 2 . In fact, 

n a -< n 13 a < (3 , ( 9 . 4 ) 

when a and (3 are arbitrary real numbers. 

There are, of course, many functions of n besides powers of n. We can 
use the A relation to rank lots of functions into an asymptotic pecking order 


Other words like 
‘symptom’ and 
‘ptomaine’ also 
come from this root. 


All functions 
great and small. 
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A loerarchy? 


that includes entries like this: 


1 -< log log n ^ logn ^ n e ^ n c ^ n logT1 ^ c n ^ n n ^ c c " . 


(Here e and c are arbitrary constants with 0 < e < 1 < c.) 

All functions listed here, except 1, go to infinity as n goes to infinity. 
Thus when we try to place a new function in this hierarchy, we’re not trying 
to determine whether it becomes infinite but rather how fast. 

It helps to cultivate an expansive attitude when we’re doing asymptotic 
analysis: We should think big, when imagining a variable that approaches 
infinity. For example, the hierarchy says that logn -<; n 0 ' 0001 ; this might 
seem wrong if we limit our horizons to teeny-tiny numbers like one googol, 
n = 1 0 1 00 . For in that case, log n = 1 00, while n 0 - 0001 is only 1 0 0 - 01 « 1 .0233. 
But if we go up to a googolplex, n = 10 10 , then logn = 10 lo ° pales in 

comparison with n 0 - 0001 = 10 10 6 . 

Even if e is extremely small (smaller than, say, 1/10 10 ), the value 

of log n will be much smaller than the value of n £ , if n is large enough. For 
if we set n = 10 lo ~ , where k is so large that e )> 10 ~ k , we have logn = 10 2k 
but n e ^ 10 10 . The ratio (logn)/n e therefore approaches zero as n — > oo. 

The hierarchy shown above deals with functions that go to infinity. Often, 
however, we’re interested in functions that go to zero, so it’s useful to have 
a similar hierarchy for those functions. We get one by taking reciprocals, 
because when f(n) and g(n) are never zero we have 


f(n) g(n) 


1 ; 1 

gR ^ ffaj' 


(9-5) 


Thus, for example, the following functions (except 1) all go to zero: 

1111111 1 

— it ^ = — = 1 . 

c c n n c n n lo s n n c n e logn log logn 

Let’s look at a few other functions to see where they fit in. The number 
7t(n) of primes less than or equal to n is known to be approximately n/lnn. 
Since 1 /n £ -< 1 /In n -< 1 , multiplying by n tells us that 


n 1 e ^ 7t(n) -k n. 


We can in fact generalize ( 9 . 4 ) by noticing, for example, that 

n * 1 (logn) “ 2 (log logn ) 1 * 3 -< n 13 ' (logn ) 132 (log logn ) 133 

4=^ (ai,a2,a3) < (|3i , P 2 , P 3 ) • (9-6) 

Here ‘(ai , <X 2 , 1 x 3 ) < (|3i , P 2 , P 3 )’ means lexicographic order (dictionary or- 
der); in other words, either ai < (3 1 , or oci = (3 1 and 1 x 2 < P 2 , or aj = |3i 
and 0 C 2 = P 2 and 0 C 3 < P 3 . 
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How about the function e''/ logn ; where does it live in the hierarchy? We 
can answer questions like this by using the rule 

e f(n) -< e 9(n) lim (f(n) — g(n)) = — oo , (9.7) 

n— »oo v ' 

which follows in two steps from definition (9.3) by taking logarithms. Conse- 
quently 

1 -< f(n) A g(n) =» e |f(Tl)l -< e ls(n)l . 

And since 1 -< log log n -< yj logn -< elogn, we have logn -< e/ lo S n n e_ 

When two functions f(n) and g(n) have the same rate of growth, we 
write l f(n) x g(n)’. The official definition is: 

f(n) x g(n) |f(n)| ^ C|g(n)| and |g(n)| sj C|f(n)| , 

for some C and for all sufficiently large n. (9-8) 

This holds, for example, if f(n) is constant and g(n) = cosn + arctann. We 
will prove shortly that it holds whenever f(n) and g(n) are polynomials of 
the same degree. There’s also a stronger relation, defined by the rule 

f(n) ~ g(n) <==» lim = 1 . (9.9) 

rwoo g (n) 

In this case we say that “f(n) is asymptotic to g(n).” 

G. H. Hardy [179] introduced an interesting and important concept called 
the class of log arithmico- exponential functions, defined recursively as the 
smallest family £ of functions satisfying the following properties: 

• The constant function f(n) = a is in £, for all real a. 

• The identity function f(n) = n is in £. 

• If f(n) and g(n) are in £, so is f(n) — g(n). 

• If f(n) is in £, so is e f(n) . 

• If f(n) is in £ and is “eventually positive,” then lnf(n) is in £. 

A function f(n) is called “eventually positive” if there is an integer no such 
that f(n) > 0 whenever n iz no. 

We can use these rules to show, for example, that f(n) + g(n) is in £ 
whenever f(n) and g(n) are, because f(n) + g(n) = f(n) — (0 — g(n)). If f(n) 
and g(n) are eventually positive members of £, their product f(n)g(n) = 
e inf(n)+ing(n) a nd quotient f(n)/g (n) = e lnf(n) -’ n 9(n) are in £; so are func- 
tions like sj f(n) = e2 lnf ^ n \ etc. Hardy proved that every logarithmico- 
exponential function is eventually positive, eventually negative, or identically 
zero. Therefore the product and quotient of any two £-functions is in £, 
except that we cannot divide by a function that’s identically zero. 
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Hardy’s main theorem about logarithmico-exponential functions is that 
they form an asymptotic hierarchy: If f(n) and g(n) are any functions in £, 
then either f(n) g(n), or f(n) >- g(n), or f(n) x g(n). In the last case 
there is, in fact, a constant a such that 

f(n) ~ ag(n) . 

The proof of Hardy’s theorem is beyond the scope of this book; but it’s nice 
to know that the theorem exists, because almost every function we ever need 
to deal with is in £. In practice, we can generally fit a given function into a 
given hierarchy without great difficulty. 


9.2 O NOTATION 


. . wir durch das 
Zeichen O(n) eine 
GroBe ausdriicken, 
deren Ordnung in 
Bezug auf n die 
Ordnung von n 
nicht iiberschreitet; 
ob sie wirklich 
Glieder von der 
Ordnung n in sicb 
enthalt, bleibt bei 
dem bisherigen 
ScbiuBverfahren 
dahingestellt.” 

— P. Bacbmann [17] 


A wonderful notational convention for asymptotic analysis was in- 
troduced by Paul Bachmann in 1894 and popularized in subsequent years by 
Edmund Landau and others. We have seen it in formulas like 

H n = lnn + y + Ofl/n), ( 9 . 10 ) 

which tells us that the nth harmonic number is equal to the natural logarithm 
of n plus Euler’s constant, plus a quantity that is “Big Oh of 1 over n.” This 
last quantity isn’t specified exactly; but whatever it is, the notation claims 
that its absolute value is no more than a constant times 1 /n. 

The beauty of O-notation is that it suppresses unimportant detail and 
lets us concentrate on salient features: The quantity 0(1 /n) is negligibly 
small, if constant multiples of 1 /n are unimportant. 

Furthermore we get to use O right in the middle of a formula. If we want 
to express ( 9 . 10 ) in terms of the notations in Section 9.1, we must transpose 
‘Inn + y’ to the left side and specify a weaker result like 


H n — In n — y -< 


log log n 
n 


or a stronger result like 


H n — In n — y x — . 

n 

The Big Oh notation allows us to specify an appropriate amount of detail 
in place, without transposition. 

The idea of imprecisely specified quantities can be made clearer if we 
consider some additional examples. We occasionally use the notation ‘±1 ’ to 
stand for something that is either +1 or —1; we don’t know (or perhaps we 
don’t care) which it is, yet we can manipulate it in formulas. 
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N. G. de Bruijn begins his book Asymptotic Methods in Analysis [74] by 
considering a Big Ell notation that helps us understand Big Oh. If we write 
L(5) for a number whose absolute value is less than 5 (but we don’t say what 
the number is), then we can perform certain calculations without knowing 
the full truth. For example, we can deduce formulas such as 1 + L(5) = L( 6 ); 
L(2) + L(3) = L(5); L(2)L(3) = L( 6 ); e L ^ 5 * = L(e 5 ); and so on. But we cannot 
conclude that L(5) — L(3) = L(2), since the left side might be 4 — 0. In fact, 
the most we can say is L(5) — L(3) = L( 8 ). 

Bachmann’s O-notation is similar to L-notation but it’s even less precise: 
0 (a) stands for a number whose absolute value is at most a constant times |a|. 
We don’t say what the number is and we don’t even say what the constant is. 
Of course the notion of a “constant” is nonsense if there is nothing variable 
in the picture, so we use O-notation only in contexts when there’s at least 
one quantity (say n) whose value is varying. The formula 

f(n) = 0 (g(n)) for all u ( 9 . 11 ) 

means in this context that there is a constant C such that 

|f(n)| ^ C|g(n)| for all tl; ( 9 . 12 ) 

and when 0 (g(n)) stands in the middle of a formula it represents a function 
f(n) that satisfies ( 9 . 12 ). The values of f(n) are unknown, but we do know 
that they aren’t too large. Similarly, de Bruijn’s ‘L(n)’ represents an un- 
specified function f(n) whose values satisfy I f (n) I < |u|. The main difference 
between L and O is that O-notation involves an unspecified constant C; each 
appearance of O might involve a different C, but each C is independent of u. 
For example, we know that the sum of the first u squares is 

□n = yn(n+ l)(n+ 1) = ln 3 + ln 2 + ln. 

We can write 

□n = 0(u 3 ) 

because |ln 3 + \n 2 + |n| < l|u | 3 + l|n | 2 + l|n| ^ i|n 3 | + l|u 3 | + l|n 3 | = |n 3 | 
for all integers n. Similarly, we have the more specific formula 

□n = }n 3 + 0(n 2 ); 

we can also be sloppy and throw away information, saying that 

□tv = 0 (n 10 ) . 


It’s not nonsense, 
but it is pointless. 


I’ve got a little 
list — I’ve got a 
little list, 

Of annoying terms 
and details that 
might well be under 
ground, 

And that never 
would be missed — 
that never would be 
missed. 


Nothing in the definition of O requires us to give a best possible bound. 
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But wait a minute. What if the variable n isn’t an integer? What if we 
have a formula like S (x) = -|x 3 + jX 2 + |x, where x is a real number? Then we 
cannot say that S(x) = 0(x 3 ), because the ratio S(x)/x 3 = g + jX -1 + |x~ 2 
becomes unbounded when x — > 0. And we cannot say that S(x) = O(x), 
because the ratio S(x)/x = gX 2 + |x + i becomes unbounded when x — » oo. 
So we apparently can’t use O-notation with S(x). 

The answer to this dilemma is that variables used with O are generally 
subject to side conditions. For example, if we stipulate that |x| is 1, or that 
x ^ e where e is any positive constant, or that x is an integer, then we can 
write S(x) = 0(x 3 ). If we stipulate that |x| ^ 1, or that |x| ^ c where c is 
any positive constant, then we can write S(x) = O(x). The O-notation is 
governed by its environment, by constraints on the variables involved. 

These constraints are often specified by a limiting relation. For example, 
we might say that 

f(n) = 0(g(n)) as n -> oo. (9.13) 

This means that the O-condition is supposed to hold when n is “near” 00; 
we don’t care what happens unless n is quite large. Moreover, we don’t 
even specify exactly what “near” means; in such cases each appearance of O 
implicitly asserts the existence of two constants C and no, such that 

| f (n) | ^ C|g(n)| whenever n ^ no . (9.14) 


You are the fairest 
of your sex, 

Let me be your 
hero; 

I love you as 
one over x, 

As x approaches 
zero. 

Positively. 


The values of C and no might be different for each O, but they do not depend 
on n. Similarly, the notation 

f(x) = 0(g(x)) as x — > 0 

means that there exist two constants C and e such that 

| f (x) | C | g (x) | whenever |x| ^ e. (9.15) 

The limiting value does not have to be 00 or 0; we can write 


lnz = z— 1 + 0((z— I) 2 ) as z — > 1 , 

because it can be proved that | In z — z + 1 1 ^ |z — 1 1 2 when |z — TJ: ^ \ . 

Our definition of O has gradually developed, over a few pages, from some- 
thing that seemed pretty obvious to something that seems rather complex; we 
now have O representing an undefined function and either one or two unspec- 
ified constants, depending on the environment. This may seem complicated 
enough for any reasonable notation, but it’s still not the whole story! Another 
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subtle consideration lurks in the background. Namely, we need to realize that 
it’s fine to write 

ju 3 + ju 2 + \n = 0 (n 3 ) , 

but we should never write this equality with the sides reversed. Otherwise 
we could deduce ridiculous things like n = n 2 from the identities n = 0 (n 2 ) 
and n 2 = 0 (n 2 ). When we work with O-notation and any other formulas 
that involve imprecisely specified quantities, we are dealing with one-way 
equalities. The right side of an equation does not give more information than 
the left side, and it may give less; the right is a “crudification” of the left. 

Prom a strictly formal point of view, the notation 0 (g(n)) does not 
stand for a single function f(n), but for the set of all functions f(n) such 
that | f (rt) | C| g(n) | for some constant C. An ordinary formula g(n) that 
doesn’t involve O-notation stands for the set containing a single function 
f(n) = g(n). If S and T are sets of functions of n, the notation S + T stands 
for the set of all functions of the form f(n) + g(n), where f(n) £ S and 
g(n) £ T; other notations like S—T, ST, S/T, i/S, e s , In S are defined similarly. 
Then an “equation” between such sets of functions is, strictly speaking, a set 
inclusion ; the “=’ sign really means ‘C’. These formal definitions put all of 
our O manipulations on firm logical ground. 

For example, the “equation” 

jn 3 + 0 (n 2 ) = 0 (n 3 ) 

means that Si C S2, where Si is the set of all functions of the form jn 3 +fi (n) 
such that there exists a constant Ci with I f 1 (tl) I T Ci|n 2 |, and where S2 
is the set of all functions f2(n) such that there exists a constant C2 with 
| f 2 ('Tl) I ^ C2|n 3 |. We can formally prove this “equation” by taking an arbi- 
trary element of the left-hand side and showing that it belongs to the right- 
hand side: Given In 3 + fi (n) such that |fi(n)| ^ Ci|n 2 |, we must prove 
that there’s a constant C2 such that lyU 3 + fi (n)| ^ C2 Itt . 3 | . The constant 
C2 = j + Ci does the trick, since n 2 ^ |n 3 | for all integers n. 

If '=’ really means ‘C’, why don’t we use ‘C’ instead of abusing the equals 
sign? There are four reasons. 

First, tradition. Number theorists started using the equals sign with O- 
notation and the practice stuck. It’s sufficiently well established by now that 
we cannot hope to get the mathematical community to change. 

Second, tradition. Computer people are quite used to seeing equals signs 
abused — for years FORTRAN and BASIC programmers have been writing 
assignment statements like ‘N = N + V. One more abuse isn’t much. 

Third, tradition. We often read “=’ as the word ‘is’. For instance we 
verbalize the formula H n = O(logn) by saying “H sub n is Big Oh of log n.” 


“And to auoide the 
tediouse repetition 
of these woordes: 
is equalie to: I will 
sette as I doe often 
in woorke use, a 
paire of paralleles, 
or Gemowe lines of 
one lengthe, thus: 

- ■ - , bicause 
noe .2. thynges, can 
be moare equalie.” 

— R. Recorde [305] 



“It is obvious that 
the sign = is really 
the wrong sign 
for such relations, 
because it suggests 
symmetry, and 
there is no such 
symmetry. . . . 

Once this warning 
has been given, 
there is, however, 
not much harm in 
using the sign =, 
and we shall main- 
tain it, for no other 
reason than that it 
is customary.” 

— N.G.de Bruijn [74] 


(Now is a good 
time to do warmup 
exercises 3 and 4.) 
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And in English, this ‘is’ is one-way. We say that a bird is an animal, but we 
don’t say that an animal is a bird; “animal” is a crudification of “bird.” 

Fourth, for our purposes it’s natural. If we limited our use of O-notation 
to situations where it occupies the whole right side of a formula — as in the 
harmonic number approximation H n = O(logn), or as in the description of 
a sorting algorithm’s running time T(n) = O(nlogn) — it wouldn’t matter 
whether we used “=’ or something else. But when we use O-notation in the 
middle of an expression, as we usually do in asymptotic calculations, our 
intuition is well satisfied if we think of the equals sign as an equality, and if 
we think of something like O ( 1 /n) as a very small quantity. 

So we’ll continue to use *=’, and we’ll continue to regard 0 (g(n)) as an 
incompletely specified function, knowing that we can always fall back on the 
set-theoretic definition if we must. 

But we ought to mention one more technicality while we’re picking nits 
about definitions: If there are several variables in the environment, O-notation 
formally represents sets of functions of two or more variables, not just one. 

The domain of each function is every variable that is currently “free” to vary. 

This concept can be a bit subtle, because a variable might be defined only 
in parts of an expression, when it’s controlled by a jT or something similar. 

For example, let’s look closely at the equation 

n 

y~ (k 2 + O(k)) = In 3 + 0(n 2 ) , integer n ^ 0. (9.16) 

k=0 

The expression k 2 + O(k) on the left stands for the set of all two- variable 
functions of the form k 2 + f(k,n) such that there exists a constant C with 
|f (k, n) | Ck for 0 ^ k n. The sum of this set of functions, for 0 k ^ n, 
is the set of all functions g(n) of the form 

n 

(k 2 +f(k, n)) = jTi 3 + -j-n 2 + ln + f(0,n) + f(1,n) H bf(n,n), 

k=0 

where f has the stated property. Since we have 

| \n 2 + gn + f (0, n) +f(1,n) H hf(n,n)| 

^ ^ri 2 4 ~ gri 2 TC-OTC -1 + ■ ■ • + C-n 
< n 2 + C(n 2 +n )/2 < (C+ 1 )n 2 , 

all such functions g(n) belong to the right-hand side of (9.16); therefore (9.16) 
is true. 

People sometimes abuse O-notation by assuming that it gives an exact 
order of growth; they use it as if it specifies a lower bound as well as an 
upper bound. For example, an algorithm to sort n numbers might be called 
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inefficient “because its running time is 0(n 2 ).” But a running time of 0(n 2 ) 
does not imply that the running time is not also 0(n). There’s another 
notation, Big Omega, for lower bounds: 


f(n) = fl(g(n)) f(n) ^ C|g(n) for some C > 0. (9.17) 


We have f(u) = Il(g(n)) if and only if g(n) = 0(f(n)). A sorting algorithm 
whose running time is fl(n 2 ) is inefficient compared with one whose running 
time is O(nlogn), if n is large enough. 

Finally there’s Big Theta, which specifies an exact order of growth: 


f(n) = 0(g(n)) 


f(n) = 0(g(n)) 
and f(n) = fl(g(n)) . 


(9-i8) 


We have f(n) = 0(g(n)) if and only if f(n) x g(n) in the notation we saw 
previously, equation (9.8). 

Edmund Landau [238] invented a “little oh” notation, 


Since Cl and 0 are 
uppercase Greek 
letters, the O in 
O-notation must 
be a capital Greek 
Omicron. 

After ail, Greeks in- 
vented asymptotics. 


f(n) = o(g(n)) 

<==A | f (n) | ^ e|g(n) 


for all n ^ n o(e) and 
for all constants e > 0. 


(9-i9) 


This is essentially the relation f(n) -< g(n) of (9.3). We also have 

f(n) ~ g(n) <==4> f(n) = g(n) + o(g(n)) . 


(9.20) 


Many authors use ‘o’ in asymptotic formulas, but a more explicit ‘O’ 
expression is almost always preferable. For example, the average running 
time of a computer method called “bubblesort” depends on the asymptotic 
value of the sum P(n) = ^2k=o ~ k k!/n!. Elementary asymptotic methods 

suffice to prove the formula P(n) ~ yJra\/2, which means that the ratio 
V(n)/^Tm./2 approaches 1 as n -) 00. However, the true behavior of P(n) is 
best understood by considering the difference, P(n) — \frmjl, not the ratio: 


n 

P(n)/\Atn/2 

P(n) - sj nn/2 

1 

0.798 

-0.253 

10 

0.878 

-0.484 

20 

0.904 

-0.538 

30 

0.918 

-0.561 

40 

0.927 

-0.575 

50 

0.934 

-0.585 


The numerical evidence in the middle column is not very compelling; it cer- 
tainly is far from a dramatic proof that P(n)/-\/7m/2 approaches 1 rapidly, 



Also ID, the Dura- 
flame logarithm. 


Notice that 
log log log n 
is undefined when 
n ^ 10. 
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if at all. But the right-hand column shows that P(n) is very close indeed to 
\J 7tn/2. Thus we can characterize the behavior of P(n) much better if we can 
derive formulas of the form 

P(n) = \J 7Tn/2 + 0(1) , 
or even sharper estimates like 

P(n) = \J nn/2 — | + 0(1/v*u) . 

Stronger methods of asymptotic analysis are needed to prove O-results, but 
the additional effort required to learn these stronger methods is amply com- 
pensated by the improved understanding that comes with O-bounds. 

Moreover, many sorting algorithms have running times of the form 

T(n) = Anlgn + Bn + O(logn) 

for some constants A and B. Analyses that stop at T(n) ~ Anlgn don’t tell 
the whole story, and it turns out to be a bad strategy to choose a sorting algo- 
rithm based just on its A value. Algorithms with a good ‘A’ often achieve this 
at the expense of a bad ‘B’. Since nlgn grows only slightly faster than n, the 
algorithm that’s faster asymptotically (the one with a slightly smaller A value) 
might be faster only for values of n that never actually arise in practice. Thus, 
asymptotic methods that allow us to go past the first term and evaluate B 
are necessary if we are to make the right choice of method. 

Before we go on to study O, let’s talk about one more small aspect of 
mathematical style. Three different notations for logarithms have been used 
in this chapter: lg, In, and log. We often use ‘lg’ in connection with computer 
methods, because binary logarithms are often relevant in such cases; and 
we often use ‘In’ in purely mathematical calculations, since the formulas for 
natural logarithms are nice and simple. But what about ‘log’? Isn’t this 
the “common” base-10 logarithm that students learn in high school — the 
“common” logarithm that turns out to be very uncommon in mathematics 
and computer science? Yes; and many mathematicians confuse the issue 
by using ‘log’ to stand for natural logarithms or binary logarithms. There 
is no universal agreement here. But we can usually breathe a sigh of relief 
when a logarithm appears inside O-notation, because O ignores multiplicative 
constants. There is no difference between O(lgn), O(lnn), and O(logn), as 
n — > oo; similarly, there is no difference between O(lglgn), O (In Inn), and 
O (log log n). We get to choose whichever we please; and the one with ‘log’ 
seems friendlier because it is more pronounceable. Therefore we generally 
use ‘log’ in all contexts where it improves readability without introducing 
ambiguity. 
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9.3 O MANIPULATION 


The secret of being 
a bore is to tell 
everything. 

— Voltaire 

n m = 0(n m ), when m ^ m'; (9.21) 

0(f(n)) + 0(g(n)) = 0(|f(n)| + |g(n)|) . (9.22) 

Then we can say immediately that ln 3 + /ri 2 -|-in = 0(n 3 )+0(n 3 )+0(n 3 ) = 

0 (n 3 ), without the laborious calculations in the previous section. 

Here are some more rules that follow easily from the definition: 


f(n) 

= 0 (f(n)); 

(9-23) 

c- 0 (f(n)) 

= 0 (f(n)) , if c is constant; 

(9-24) 

0 ( 0 (f(u))) 

= 0 (f(n)); 

(9-25) 

0 (f(n)) 0 (g(n)) 

= 0 (f(n)g(n)) ; 

(9.26) 

0 (f(n)g(n)) 

= f(n) 0 (g(n)). 

(9-27) 


Like any mathematical formalism, the O-notation has rules of ma- 
nipulation that free us from the grungy details of its definition. Once we 
prove that the rules are correct, using the definition, we can henceforth work 
on a higher plane and forget about actually verifying that one set of functions 
is contained in another. We don’t even need to calculate the constants C that 
are implied by each O, as long as we follow rules that guarantee the existence 
of such constants. 

For example, we can prove once and for all that 


Exercise 9 proves ( 9 . 22 ), and the proofs of the others are similar. We can 
always replace something of the form on the left by what’s on the right, 
regardless of the side conditions on the variable n. 

Equations ( 9 . 27 ) and ( 9 . 23 ) allow us to derive the identity 0(f(n) 2 ) = 
0(f(n)) 2 . This sometimes helps avoid parentheses, since we can write 

O(logn ) 2 instead of 0((logn) 2 ) . 

Both of these are preferable to ‘0(log 2 n)’, which is ambiguous because some 
authors use it to mean ‘O(loglogn)’. 

Can we also write 

O(logn ) -1 instead of 0((logn) -1 )? 

No! This is an abuse of notation, since the set of functions l/0(logn) is 
neither a subset nor a superset of O ( 1 /log n) . We could legitimately substitute 
Q/logn .) -1 for 0((logn) _1 ), but this would be awkward. So we’ll restrict 
our use of “exponents outside the O” to constant, positive integer exponents. 


(Note: The formula 
0 (f(n )) 2 does not 
denote the set of 
all functions g(n ) 2 
where g(n) is in 
0 (f(n)) ; such 
functions g(n ) 2 
cannot be nega- 
tive, but the set 
0 (f(n )) 2 includes 
negative functions. 
In general, when 
S is a set, the no- 
tation S 2 stands 
for the set of all 
products S 1 S 2 with 
si and S2 in S, 
not for the set of 
all squares s 2 with 
S £ S.) 



9.3 O MANIPULATION 451 


Remember that 
9? stands for “real 
part.” 


Power series give us some of the most useful operations of all. If the sum 

S(z) = Y a n z n 

n^O 

converges absolutely for some complex number z = zo, then 
S(z) = 0(1) , for all |z| ^ |zo|. 

This is obvious, because 

|S(z)| ^ Y_ l a nl |z| n ^ Y_ I a n | |z 0 | n = C < 00 . 

n5:0 nj iO 

In particular, S(z) = 0(1) as z — > 0, and S(1/n) = 0(1) asn-> oo, provided 
only that S(z) converges for at least one nonzero value of z. We can use this 
principle to truncate a power series at any convenient point and estimate the 
remainder with O. For example, not only is S(z) = 0(1 ), but 

S(z) = Qo + 0(z) , 

S(z) = ao + aiz+ 0(z 2 ) , 

and so on, because 

S(z) = Y_ a k z k +z m Y_ 

0^k<m n^m 

and the latter sum, like S(z) itself, converges absolutely for z = zq and is 
0(1). Table 452 lists some of the most useful asymptotic formulas, half of 
which are simply based on truncation of power series according to this rule. 

Dirichlet series, which are sums of the form ^ k >i a k/k z , can be trun- 
cated in a similar way: If a Dirichlet series converges absolutely when z = zq, 
we can truncate it at any term and get the approximation 

Y a k /k z + 0(m~ z ) , 

1 k<m 


valid for 1Hz <Hzo. The asymptotic formula for Bernoulli numbers B n in 
Table 452 illustrates this principle. 

On the other hand, the asymptotic formulas for H n , n!, and 7t(n) in 
Table 452 are not truncations of convergent series; if we extended them in- 
definitely they would diverge for all values of n. This is particularly easy to 
see in the case of 7t(n), since we have already observed in Section 7.3, Ex- 
ample 5, that the power series ^I k>0 k!/(lnn) k is everywhere divergent. Yet 
these truncations of divergent series turn out to be useful approximations. 
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Table 452 Asymptotic approximations, valid as n — > oo and z — ) 0. 


H n = 


, 1 1 

lnn + y+ y 


1 


2n 12n 2 120n 4 


n\ 


\ e 


139 


12n 288n 2 51840u 3 




i 




B n = 


2[n even](— 1 ^^(l + 2“ n + 3~ n + 0(4^)) 


7t(n) = 


n n 2! n 

+ 77 + 


3! n 


Inn (Inn) 2 (Inn) 3 (Inn) 4 


O 


n 


(logn) 


ln( 1 + z) 

1 


Z 2 Z 3 Z 4 C 

eZ = 1 +Z+ 2! + 3! + 4! +0(z] 
z 2 z 3 z 4 

z -2 + y-4 +0(z) - 


1 — z 
(1 +z)“ 


= 1 +z + z 2 + z 3 +z 4 + 0(z 5 ). 


a 


a 


a 


z 4 + 0(z s 


(9.28) 

• (9-29) 

(9-3o) 

(9-3i) 

( 9 - 32 ) 

(9-33) 

(9-34) 

(9-35) 


An asymptotic approximation is said to have absolute error 0(g(n)) 
if it has the form f(n) + 0(g(n)) where f(n) doesn’t involve O. The ap- 
proximation has relative error 0(g(n)) if it has the form f(n)(l + 0(g(n))) 
where f(n) doesn’t involve O. For example, the approximation for H n in 
Table 452 has absolute error 0(n~ 6 ); the approximation for n! has relative 
error 0(n~ 4 ). (The right-hand side of (9.29) doesn’t actually have the re- 
quired form f(n)(l + 0(n~ 4 )), but we could rewrite it 


V2' 


nn 


1 1 

12n + 288n 2 


139 \ 

51840n 3 ) 


(l+0(n- 4 )) 


if we wanted to; a similar calculation is the subject of exercise 12.) The ab- 
solute error of this approximation is 0(n n ~ 3 - 5 e~ n ). Absolute error is related 
to the number of correct decimal digits to the right of the decimal point if 
the O term is ignored; relative error corresponds to the number of correct 
“significant figures.” 

We can use truncation of power series to prove the general laws 

ln(l + 0(f(n))) =0(f(n)), if f(n) A 1; (9.36) 

e°(f(n» = 1 + 0 (f(n)), if f(u) = 0 ( 1 ). (9.37) 


(Relative error 
is nice for taking 
reciprocals, because 
1/(1 + 0 ( 6 )) = 

1 + 0(e) J 
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(Here we assume that n — > oo; similar formulas hold for ln(l + 0(f(x))) and 
gO (f (x) ) as x 0.) For example, let ln(l + g(n)) be any function belonging 
to the left side of (9.36). Then there are constants C, no, and c such that 

|g(n)| ^ C|f(n)| ^ c < 1, foralln^n 0 . 

It follows that the infinite sum 

ln(1+g(n)) = g(n) • (1 - ^g(u) + lg(n) 2 ) 

converges for all n. ^ no, and the parenthesized series is bounded by the 
constant 1 + \c + |c 2 + • • • . This proves (9.36), and the proof of (9.37) is 
similar. Equations (9.36) and (9.37) combine to give the useful formula 

(1 + 0(f(n)))°<^» = 1 + 0(f(n)g(n)) , qH). (9 ' 38) 

Problem 1 : Return to the Wheel of Fortune. 

Let’s try our luck now at a few asymptotic problems. In Chapter 3 we 
derived equation (3.13) for the number of winning positions in a certain game: 

W = |N/KJ + lK 2 + §K-3, K=L^NJ- 

And we promised that an asymptotic version of W would be derived in Chap- 
ter 9. Well, here we are in Chapter 9; let’s try to estimate W, as N — > 00. 

The main idea here is to remove the floor brackets, replacing K by N 1 + 

0(1). Then we can go further and write 

K = N 1/3 (l + 0(N~ 1/3 )) ; 

this is called “pulling out the large part.” (We will be using this trick a lot.) 
Now we have 

K 2 = N 2/3 (l + 0(N~ 1/3 )) 2 

= N 2/3 (l + 0(NT 1/3 )) = N 2/3 + 0(N 1/3 ) 

by (9.38) and (9.26). Similarly 

|N/KJ = IN 1 1/3 (1 + OthT 1 / 3 ))- 1 + 0(1) 

= N 2/3 (l +0(N~ 1/3 )) +0(1) = N 2/3 + 0 (N 1/3 ). 

It follows that the number of winning positions is 


w = N 2/3 + 0(N 1/3 ) + 1 (N 2/3 + 0(N 1/3 )) + 0(N 1/3 ) + 0(1) 

= |N 2 / 3 + 0 (N'/ 3 ). (9.39) 
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Notice how the O terms absorb one another until only one remains; this is 
typical, and it illustrates why O-notation is useful in the middle of a formula. 

Problem 2: Perturbation of Stirling’s formula. 

Stirling’s approximation for n! is undoubtedly the most famous asymp- 
totic formula of all. We will prove it later in this chapter; for now, let’s just 
try to get better acquainted with its properties. We can write one version of 
the approximation in the form 


n! = 



asn-)oo, (9-4o) 


for certain constants a and b. Since this holds for all large n, it must also be 
asymptotically true when n is replaced by n — 1 : 


(n — 1 )! = \/2n(n — 1 ) (^^-) 

*( l + ^ + i^ + 0 < ,n - ir3 >)' (941) 

We know, of course, that (n — 1 )! = n!/n; hence the right-hand side of this 
formula must simplify to the right-hand side of (9.40), divided by n. 

Let us therefore try to simplify (9.41). The first factor becomes tractable 
if we pull out the large part: 


i/27t(n — 1 ) = V 27tn ( 1 — n 

= v^(l-T- gL+0(n- 3 )). 

Equation (9.35) has been used here. 

Similarly we have 


a 

TV- 1 

a n -1 

= — (1 — n 

)-’ = " + 

Ci 

—y + 0( n 

n 

n 

n 2 

b 

II 

1 \—2. b 

+ 0(n- 3 ); 

-I ) 2 

1 n 2 

ir 3 ) 

= 0(tW 3 (1 - 

n- 1 )- 3 ) = 

0(n- 3 ). 


The only thing in (9.41) that’s slightly tricky to deal with is the factor 
(n — 1 ) n_1 , which equals 

n n_1 (1 = n n_1 (1 — rW 1 ) n (l + n^ 1 + n~ 2 + 0(n- 3 )) . 
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(We are expanding everything out until we get a relative error of 0 (rU 3 ), 
because the relative error of a product is the sum of the relative errors of the 
individual factors. All of the 0 (rU 3 ) terms will coalesce.) 

In order to expand (1 — rU 1 J* 1 , we first compute ln(l — rU 1 ) and then 
form the exponential, e nln,1_n ' h 

(1 — rU 1 )" = exp(nln(l — n _1 )) 

= exp(n(— ru 1 — In -2 — |n- 3 + 0(ru 4 ))) 

= exp(— 1 — jTU 1 — |n.- 2 + 0(n -3 )) 

= exp(— 1 ) • exp(— jTU 1 ) • exp(— |n -2 ) • exp(0(n -3 )) 

= exp(-l)- (1 - In- 1 +ln- 2 + 0(n- 3 )) 

• (1 - In- 2 + 0(n- 4 )) • (1 + 0(n- 3 )) 

= e _1 (l - 2 n_1 “ J4 n l + 0(n- 3 )) . 

Here we use the notation exp z instead of e z , since it allows us to work with 
a complicated exponent on the main line of the formula instead of in the 
superscript position. We must expand ln( 1 — n _1 ) with absolute error 0 (n- 4 ) 
in order to end with a relative error of 0(n- 3 ), because the logarithm is being 
multiplied by n. 

The right-hand side of (9.41) has now been reduced to V 27 m times 
n n - Ve n times a product of several factors: 

(1-^n- 1 - |n- 2 + 0(n- 3 )) 

• (l + n _1 + n -2 + 0(n- 3 )) 

' (1 -^n- 1 -^n- 2 + 0(n- 3 )) 

• (1 + an -1 + (a + b)n- 2 + 0(n- 3 )) . 

Multiplying these out and absorbing all asymptotic terms into one Ofn -3 ) 
yields 

1 + an -1 + (a + b — y^n -2 + CUn -3 ) . 

Hmmm; we were hoping to get 1 + an -1 + bn~ 2 + 0 (n- 3 ), since that’s what 
we need to match the right-hand side of (9.40). Has something gone awry? 
No, everything is fine, provided that a + b — yj = b. 

This perturbation argument doesn’t prove the validity of Stirling’s ap- 
proximation, but it does prove something: It proves that formula (9.40) can- 
not be valid unless a = yy. If we had replaced the Ofn -3 ) in (9.40) by 
cn~ 3 +0(n- 4 ) and carried out our calculations to a relative error of Ofn -4 ), 
we could have deduced that b must be ygg, as claimed in Table 452 . (This is 
not the easiest way to determine the values of a and b, but it works.) 
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Problem 3: The nth prime number. 

Equation (9.31) is an asymptotic formula for 7t(n), the number of primes 
that do not exceed u. If we replace n by p = P n , the nth prime number, we 
have 7t(p) = n; hence 

" = i + 0 (iSi7p) (942) 

as n -> 00. Let us try to “solve” this equation for p; then we will know the 
approximate size of the nth prime. 

The first step is to simplify the O term. If we divide both sides by p/lnp, 
we find that nlnp/p — » 1; hence p/lnp = 0(n) and 


0 


P 

(logp) 2 


= O 


n 

logp 


= O 


n 

logn 


(We have (logp) -1 (logn) -1 because p ^ n.) 

The second step is to transpose the two sides of (9.42), except for the 
O term. This is legal because of the general rule 


a n = b n + 0(f(n)) <^> b n = a n + 0(f(n)). (9.43) 

(Each of these equations follows from the other if we multiply both sides 
by —1 and then add a n + b n to both sides.) Hence 

4 = n + 0 (tojy = lO+on/logn)), 

and we have 


p = nlnp(l + 0(1/logn)) . (9.44) 

This is an “approximate recurrence” for p = P n in terms of itself. Our goal 
is to change it into an “approximate closed form,” and we can do this by 
unfolding the recurrence asymptotically. So let’s try to unfold (9.44). 

By taking logarithms of both sides we deduce that 

lnp = Inn + lnlnp + 0(1/logn) . (9-45) 

This value can be substituted for lnp in (9.44), but we would like to get rid 
of all p’s on the right before making the substitution. Somewhere along the 
line, that last p must disappear; we can’t get rid of it in the normal way for 
recurrences, because (9.44) doesn’t specify initial conditions for small p. 

One way to do the job is to start by proving the weaker result p = 0(n 2 ). 
This follows if we square (9.44) and divide by pn 2 , 

4 = ^ki(l+0(l/logn)), 

TV- p V ' 
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Get out the scratch 
paper again, gang. 

Boo, Hiss. 


since the right side approaches zero asm oo. OK, we know that p = 0 (n 2 ); 
therefore logp = O(logn) and log log p = O (log log n). We can now conclude 
from (9.45) that 

lnp = Inn + O(loglogn) ; 

in fact, with this new estimate in hand we can conclude that In lnp = lnlnn+ 
0(loglogn/logn), and (9.45) now yields 

lnp = Inn + In Inn + 0(loglogn/logn) . 

And we can plug this into the right-hand side of (9.44), obtaining 

p = nlnn + nlnlnn + O(n) . 

This is the approximate size of the nth prime. 

We can refine this estimate by using a better approximation of 7t(n) in 
place of (9.42). The next term of (9.31) tells us that 

n lnp (lnp) 2 (logp) 3 ) ’ 

proceeding as before, we obtain the recurrence 

p = nlnp (1 + (lnp)- 1 ) -1 (l + 0(l/logn) 2 ) , 

which has a relative error of 0 (l/logn) 2 instead of 0(1 /logn). Taking loga- 
rithms and retaining proper accuracy (but not too much) now yields 


(9.46) 


(9-47) 


In n + In In p + 0(1 /log n) 

lnnfl + ^ + 0(1 /logn) 2 ) ; 

, , In In n / log log n \ 2 

In In n -| — + 0 f S . 

In n V log n / 

Finally we substitute these results into (9.47) and our answer finds its way 
out: 

, , , In In n / n \ , . 

P n = n Inn + n mmn — n + n— h O ; . (9-48) 

Inn Vlogn/ 

For example, when n = 10 6 this estimate comes to 15631363.8 + 0 (n/logn); 
the millionth prime is actually 15485863 . Exercise 21 shows that a still more 
accurate approximation to P n results if we begin with a still more accurate 
approximation to 7 t(n) in place of (9.46). 


lnp = 


In lnp = 
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Problem 4: A sum from an old final exam. 

When Concrete Mathematics was first taught at Stanford University dur- 
ing the 1970-1971 term, students were asked for the asymptotic value of the 
sum 


Sn = 


n 2 + 1 n 2 + 2 


rP 


n 


(9-49) 


with an absolute error of 0(n~ 7 ). Let’s imagine that we’ve just been given 
this problem on a (take-home) final; what is our first instinctive reaction? 

No, we don’t panic. Our first reaction is to think big. If we set n = 
lO 100 , say, and look at the sum, we see that it consists of n terms, each of 
which is slightly less than 1/n 2 ; hence the sum is slightly less than 1 /n. In 
general, we can usually get a decent start on an asymptotic problem by taking 
stock of the situation and getting a ballpark estimate of the answer. 

Let’s try to improve the rough estimate by pulling out the largest part 
of each term. We have 

1 1 1 / k k 2 k 3 /k 4 

n 2 + k n 2 n+k/n 2 ) n 2 \ n 2 n 4 n 6 Vn 8 

and so it’s natural to try summing all these approximations: 



1 

1 

1 

l 2 

I 3 „ 

f— 1 

n 2 + 1 

“ U 2 


n 6 

n 8 + U| 

In 10 / 

1 

1 

2 

2 2 

2 3 ^ 
i_ n i 

(*_) 

n 2 +2 

“ TV 2 


n 6 

n 8 



1 1 

n n 2 

n 3 i 

^ n 4 \ 

u 2 +n “ n 2 

n 4 + n 6 

n 8 



c _ n n(n + l) , 

u 2 2n 4 + ' ' ' ’ 

It looks as if we’re getting S n = nc 1 — j n ~ 2 + 0(n~ 3 ), based on the sums of 
the first two columns; but the calculations are getting hairy. 

If we persevere in this approach, we will ultimately reach the goal; but 
we won’t bother to sum the other columns, for two reasons: First, the last 
column is going to give us terms that are 0(n~ 6 ), when n/2 ^ k ^ n, so we 
will have an error of 0(n~ 5 ); that’s too big, and we will have to include yet 
another column in the expansion. Could the exam-giver have been so sadistic? 
We suspect that there must be a better way. Second, there is indeed a much 
better way, staring us right in the face. 


Do pajamas have 
buttons? 
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Namely, we know a closed form for S n : It’s just H n 2 +n — H n 2 . And we 
know a good approximation for harmonic numbers, so we just apply it twice: 

H„ J+n = ln(n 2 +n)+y+ 2(n2 ' +n) - U[ J + n)2 + °(4) ! 

H n , = 1 »" 2 + T + ^I- u4 + °(s>)' 

Now we can pull out large terms and simplify, as we did when looking at 
Stirling’s approximation. We have 

ln(n 2 +n) = Inn 2 + ln(l + = Inn 2 + — — j ; 

\ nJ n 2n 2 Sn - 3 

1111 

n 2 + n n 2 n 3 n 4 ’ 

1 12 3 

(n 2 + n ) 2 n 4 n 5 n 6 

So there’s lots of helpful cancellation, and we find 

S n = ru 1 - \-nr 2 + IrU 3 - ^ru 4 + IrU 5 - in -6 
— ln ~ 3 + In -4 — in -5 + jn ~ 6 
+ g ™- 5 - l 13-6 

plus terms that are 0(n -7 ). A bit of arithmetic and we’re home free: 

S n = ru 1 - ln~ 2 - In - 3 + \n 4 - ^n- 5 + ^ru 6 + 0(n- 7 ) . (9.50) 

It would be nice if we could check this answer numerically, as we did 
when we derived exact results in earlier chapters. Asymptotic formulas are 
harder to verify; an arbitrarily large constant may be hiding in a O term, 
so any numerical test is inconclusive. But in practice, we have no reason to 
believe that an adversary is trying to trap us, so we can assume that the 
unknown O-constants are reasonably small. With a pocket calculator we find 
that S 4 = p 7 + Yg + Y^ + yg = 0.2170107; and our asymptotic estimate when 
n = 4 comes to 

+ + + = 0.2170125. 

If we had made an error of, say, jj in the term for rU 6 , a difference of jj 4^95 
would have shown up in the fifth decimal place; so our asymptotic answer is 
probably correct. 
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Problem 5: An infinite sum. 

We turn now to an asymptotic question posed by Solomon Golomb [152]: 
What is the approximate value of 


S 


n 


L 


i 

kN n (k) 2 ’ 


(9-5i) 


where N n (k) is the number of digits required to write k in radix n notation? 

First let’s try again for a ballpark estimate. The number of digits, N n (k), 
is approximately log n k = logk/logn; so the terms of this sum are roughly 
(logn) 2 /k(logk) 2 . Summing on k gives ss (logri) 2 £) k>2 1 /k(log k) 2 , and this 
sum converges to a constant value because it can be compared to the integral 

'°° dx 1 00 _ 1 

J 2 x(lnx) 2 lnx 2 In 2 

Therefore we expect S n to be about C(logn) 2 , for some constant C. 

Hand- wavy analyses like this are useful for orientation, but we need better 
estimates to solve the problem. One idea is to express N n (k) exactly: 


Nn(k) = Llog n kJ +1 . 


(9-52) 


Thus, for example, k has three radix n digits when n 2 ^ k < n 3 , and this 
happens precisely when [log n kj = 2. It follows that N n (k) > log n k, hence 
S n = LkM VkN n (k) 2 < 1 + (logn) 2 2Z k ^ 2 Vk(logk) 2 . 

Proceeding as in Problem 1, we can try to write N n (k) = log n k + 0(1 ) 
and substitute this into the formula for S n . The term represented here by O ( 1 ) 
is always between 0 and 1 , and it is about \ on the average, so it seems rather 
well-behaved. But still, this isn’t a good enough approximation to tell us 
about S n ; it gives us zero significant figures (that is, high relative error) when 
k is small, and these are the terms that contribute the most to the sum. We 
need a different idea. 

The key (as in Problem 4) is to use our manipulative skills to put the 
sum into a more tractable form, before we resort to asymptotic estimates. We 
can introduce a new variable of summation, m = N n (k): 


S 


n 


L 

k,m^1 

L 

k,m^1 


m = N n (k)] 
km 2 

[n m_1 ^ k< n m ] 
km 2 
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Into a Big Oh. 


This may look worse than the sum we began with, but it’s actually a step for- 
ward, because we have very good approximations for the harmonic numbers. 

Still, we hold back and try to simplify some more. No need to rush into 
asymptotics. Summation by parts allows us to group the terms for each value 
of H n m_ 1 that we need to approximate: 

Sn = X H ^ k -l(^2 - (Tc+ip) • 

For example, H n 2_ 1 is multiplied by 1 / 2 2 and then by — 1 / 3 2 . (We have used 
the fact that H n o_i = Ho = 0.) 

Now we’re ready to expand the harmonic numbers. Our experience with 
estimating (n— 1)! has taught us that it will be easier to estimate H n k than 
H n k_ 1 , since the (n k — 1 )’s will be messy; therefore we write 


H r 


= H,. 


1 

u k 


Our sum now reduces to 


‘nn k + y+ +°(^b) - T 

k ln n + Y __L + 0 (_L). 


S„ = ^klnn + y-^ + O^^l-jM^) 

k>1 ’ ’ 

= (lnn)Ii +yl 2 - 2l 3 (n) + 0(l 3 (n 2 )) . (9.53) 

There are four easy pieces left: Li, L 2 , L 3 (n), and L 3 (n 2 ). 

Let’s do the I 3 ’s first, since I 3 (n 2 ) is the O term; then we’ll see what 
sort of error we’re getting. (There’s no sense carrying out other calculations 
with perfect accuracy if they will be absorbed into a O anyway.) This sum is 
simply a power series, 

l3W = (k + 1 ) 2 ) X k ’ 

and the series converges when x ^ 1 so we can truncate it at any desired 
point. If we stop L 3 (n 2 ) at the term for k = 1 , we get L 3 (n 2 ) = 0 (rU 2 ); 
hence (9.53) has an absolute error of 0 (n -2 ). (To decrease this absolute error, 
we could use a better approximation to H n k; but 0(n -2 ) is good enough for 
now.) If we truncate L 3 (n) at the term for k = 2 , we get 

I 3 (n) = fn- 1 +0(u- 2 ); 

this is all the accuracy we need. 
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We might as well do Li now, since it is so easy: 

12 = (p _ (ic+1)2) ' 
k$:1 v ’ 

This is the telescoping series (1— \) + (\ — | ) + ( A — yg ) + • • • = 1 • 

Finally, Li gives us the leading term of S n , the coefficient of Inn in 
(9-53): 

Zl = H k (l? - (k+ l) 2 ) ' 

k^1 

This is ( 1 — y ) + ( f — f ) + ( f — +;) + ■•' = } + 5 + 9 + ' • • = = 7x2 '/ 6- (If 

we hadn’t applied summation by parts earlier, we would have seen directly 
that S n ~ X!k>i (lnu)/k 2 , because H n k_i — H n k-i_i ~ Inn; so summation 
by parts didn’t help us to evaluate the leading term, although it did make 
some of our other work easier.) 

Now we have evaluated each of the L’s in ( 9 . 53 ), so we can put everything 
together and get the answer to Golomb’s problem: 

S„ = ^Inn + y-A + oQ. (9.54) 

Notice that this grows more slowly than our original hand-wavy estimate of 
C(logn) 2 . Sometimes a discrete sum fails to obey a continuous intuition. 

Problem 6: Big Phi. 

Near the end of Chapter 4, we observed that the number of fractions in 
the Farey series T n is 1 + ®(n), where 

®(n) = cp( 1 ) + cp( 2 )+--- + cp(n); 

and we showed in ( 4 . 62 ) that 

®(a) = l X- L n AJ L 1 +n/k| . ( 9 . 55 ) 

Z k^l 

Let us now try to estimate ®(n) when n is large. (It was sums like this that 
led Bachmann to invent O-notation in the first place.) 

Thinking big tells us that ®(n) will probably be proportional to n 2 . 
For if the final factor were just [tx/IcJ instead of (1 + n/kj , we would have 
|0(n)| ^ 2 Lk>i L n AJ 2 ^ 2 Hk>i ( n / k ( 2 = fl n2 > t> ecause the Mobius 
function p(k) is either — 1, 0, or +1. The additional ‘1 + ’ in that final factor 
adds X^k>i H-(k) |_ n / k J; but this i s zero f° r k > n, so it cannot be more than 
riH n = O(nlogn) in absolute value. 
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(The error term was 
shown to be at most 
O(n(log n ) 2/ 3 x 
(log log n)' +e ) 
by Saltykov in 
1960(316], On 
the other hand, it 
is not as small as 
o(n(loglogn) 1/2 ) , 
according to Mont- 
gomery [275].) 


This preliminary analysis indicates that we’ll find it advantageous to 


write 


^L^)(© 2 + o® 

k=l v 7 k=l v 

k=1 k=1 

1 n 2 

= 2 L^)Q) + 0 (nl°g 


n 


This removes the floors; the remaining problem is to evaluate the unfloored 
sum \ Y _ k - 1 p(k)n 2 /k 2 with an accuracy of O(nlogn); in other words, we 
want to evaluate XLk=i M-(k)1 /k 2 with an accuracy of 0(n _1 logn). But that’s 
easy; we can simply run the sum all the way up to k = oo, because the newly 
added terms are 

hf 5 ' °(H p) = °(L j^TT)) 

k >n k>n k >n 

-o(l(^t-c)) = o Q . 

v k>rt 7 

We proved in ( 7 . 89 ) that Yk^i M-(k)/k z = 1/£(z). Hence Yk^ 1 p(k)/k 2 = 
1 / (Yk>i Vk 2 ) = 6 / 7 T 2 , and we have our answer: 

3 1 

®(n) = — tTI 2 + O(nlogn) . ( 9 . 56 ) 

7T Z 


9.4 TWO ASYMPTOTIC TRICKS 

Now that we have some facility with O manipulations, let’s look at 
what we’ve done from a slightly higher perspective. Then we’ll have some 
important weapons in our asymptotic arsenal, when we need to do battle 
with tougher problems. 

Trick 1: Bootstrapping. 

When we estimated the nth prime P n in Problem 3 of Section 9.3, we 
solved an asymptotic recurrence of the form 

P n = nlnP n (l + 0(1 /logn)) . 

We proved that P n = nlnn + 0(n) by first using the recurrence to show 
the weaker result 0(n 2 ). This is a special case of a general method called 
bootstrapping , in which we solve a recurrence asymptotically by starting with 
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a rough estimate and plugging it into the recurrence; in this way we can often 
derive better and better estimates, “pulling ourselves up by our bootstraps.” 

Here’s another problem that illustrates bootstrapping nicely: What is the 
asymptotic value of the coefficient g n = [z n ] G(z) in the generating function 


G(z) 



(9-57) 


as n — ) oo? If we differentiate this equation with respect to z, we find 


G'(z) = ^ng n z n 1 

n— 0 



equating coefficients of z n 1 on both sides gives the recurrence 


ng n 


L 

0^k<n 


9k 

n-k ' 


(9-58) 


Our problem is equivalent to finding an asymptotic formula for the solution 
to ( 9 . 58 ), with the initial condition go = 1. The first few values 


n 

0 

1 

2 

3 

4 

5 

6 

9n 

1 

1 

3 

4 

19 

36 

107 

288 

641 

2400 

51103 

259200 


don’t reveal much of a pattern, and the integer sequence (n! 2 g n ) doesn’t 
appear in Sloane’s Handbook [330]; therefore a closed form for g n seems out 
of the question, and asymptotic information is probably the best we can hope 
to derive. 

Our first handle on this problem is the observation that 0 < g n ^ 1 for 
all n 7: 0; this is easy to prove by induction. So we have a start: 

gn = 0(1). 

This equation can, in fact, be used to “prime the pump” for a bootstrapping 
operation: Plugging it in on the right of ( 9 . 58 ) yields 

TLQn = Y. = H nO(D = O(logn); 

0$k<n 


hence we have 


9n 


= O 


/ logn 
V n 


for n > 1 . 
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And we can bootstrap yet again: 

1 v- 0 (( 1 +logk)/k) 

ng n = - + > 

n L — 


n- k 


0<k<n 

1 y O(logn) 
n ic(n — lc) 

0<k <n v j 




0<k<n 


1 _J \ O(logn) 

k + n — k/ n 


= - + -H n _iO(logn) = -O(logn) 
n n n 


obtaining 


9n — O 


logn 

n 


(9-59) 


Will this go on forever? Perhaps we’ll have g n = 0(n 1 logri) m for all m. 

Actually no; we have just reached a point of diminishing returns. The 
next attempt at bootstrapping involves the sum 


L 

0<k<n 


1 

k 2 (n — k) 


L 

0<k<n 


1 1 1 \ 

nk 2 n 2 k + n 2 (n — k) / 


1 h (2 > 

n H n-T + n 2 H — ' 


which is fl(n 1 ); so we cannot get an estimate for g n that falls below fl(n 2 ). 

In fact, we now know enough about g n to apply our old trick of pulling 
out the largest part: 


ng n 



(9.60) 


The first sum here is G(l) = exp(| + \ + ^ + • • • ) = e n ~/ 6 , because G(z) 
converges for all |z| ^ 1 . The second sum is the tail of the first; we can get an 
upper bound by using ( 9 . 59 ): 

V n/V l Io e k ) 2 \ /(logn) 2 \ 

Lft = »ll— ) = °(— — )• 

k^n k^n 
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This last estimate follows because, for example, 


y (logk) 2 y y (logn m+1 ) 2 
■ — k 2 ■ — k(k — 1) 

k>n m^1 n m <k§;n m +’ 


< 


L 

m^l 


(m+ 1) 2 (logn) 2 

n m 


(Exercise 54 discusses a more general way to estimate such tails.) 
The third sum in (9.60) is 


° L 


Ogk<n 


(log n ) 2 
k(u-k) 


= O 


(logn) 

n 


by an argument that’s already familiar. So (9.60) proves that 


e n2/6 3 

9n = ~^2~ + 0(logn/n) . (9.61) 

Finally, we can feed this formula back into the recurrence, bootstrapping once 
more; the result is 

g7I 2 /6 

g n = — y~ + 0(logn/n 3 ) . (9.62) 

(Exercise 23 peeks inside the remaining O term.) 

Trick 2: Trading tails. 

We derived (9.62) in somewhat the same way we derived the asymptotic 
value (9.56) of ®(n): In both cases we started with a finite sum but got an 
asymptotic value by considering an infinite sum. We couldn’t simply get the 
infinite sum by introducing O into the summand; we had to be careful to use 
one approach when k was small and another when k was large. 

Those derivations were special cases of an important three-step asymp- 
totic summation method we will now discuss in greater generality. Whenever 
we want to estimate the value of a k (n), we can try the following approach: 

1 First break the sum into two disjoint ranges, D n and T n . The summation 
over D n should be the “dominant” part, in the sense that it includes 
enough terms to determine the significant digits of the sum, when n is 
large. The summation over the other range T n should be just the “tail” 
end, which contributes little to the overall total. 

2 Find an asymptotic estimate 

a k (n) = b k (n) + 0(c k (n)) 

that is valid when kg D n . The O bound need not hold when k £ T n . 


(This impor- 
tant method was 
pioneered by 
Laplace [240].) 
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Asymptotics is 
the art of knowing 
where to be sloppy 
and where to be 
precise. 


3 Now prove that each of the following three sums is small: 

£a(n) = Y Q kW; I b(n) = Y_ b k(n-); 

keT„ kGT„ 

^c(n) = Y l c k( n )|- ( 9 - 63 ) 

kGD^ 

If all three steps can be completed successfully, we have a good estimate: 

Qk(n) = Y b k(n) + 0(l Q (n)) + 0(l b (n)) + 0(l c (n)). 

kGDnUTn keDnllTtL 

Here’s why. We can “chop off” the tail of the given sum, getting a good 
estimate in the range D n where a good estimate is necessary: 

Y a k(n) = Y ( b k(n) + 0(c k (n))) = Y b k(n) + 0(l c (n)) . 

k6D n k6D„ kGD n 

And we can replace the tail with another one, even though the new tail might 
be a terrible approximation to the old, because the tails don’t really matter: 

Y a k ( n ) = Y (b k (n) -b k (n) + a k (n)) 

k6T„ keT n 

= Y + 0(Z b (n)) + 0(z a (n)) . 

kGT n 

When we evaluated the sum in ( 9 . 60 ), for example, we had 

Q k (n) = [ 0 ^k<n]g k /(n-k) , 

b k (n) = g k /n, 

Ck(n) = kg k /n(n-k); 

the ranges of summation were 

D n = {0, 1 , . . . ,n — 1} , T n = {n, n + 1 , . . . } ; 

and we found that 

I a (n) = 0 , I b (n) = 0 ((logn) 2 /n 2 ) , I c (n) = 0 ((logn) 3 /n 2 ) . 
This led to ( 9 . 61 ). 

Similarly, when we estimated ®(n) in ( 9 . 55 ) we had 

a k (n) = p(k)[n/kj [1+n/kJ , b k (n) = p(k)n 2 /k 2 , c k (n) = n/k; 
D n = {1,2, ...,n}, T n = {n + 1 , n + 2, . . . } . 

We derived ( 9 . 56 ) by observing that I a (n) = 0, Zb(n) = O(n), and L c (n) = 
O(nlogn). 
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Here’s another example where tail switching is effective. (Unlike our 
previous examples, this one illustrates the trick in its full generality, with 
Z a (n) 7 ^ 0.) We seek the asymptotic value of 


Also, horses switch 
their tails when 
feeding time ap- 
proaches. 


L 


n 


z 


ln(n + 2 k ) 
k! 


The big contributions to this sum occur when k is small, because of the k! in 
the denominator. In this range we have 

2 k 2 2k /2 3k \ 

ln(n + 2 k ) = lnn+ — - + • ( 9 - 64 ) 

We can prove that this estimate holds for 0 ^ k < [lg txJ , since the original 
terms that have been truncated with O are bounded by the convergent series 


z 

m^3 


2 ^m 


mri m 


< 



z 

mj>3 


2 ^(m— 3) 
n m -3 


< 


2 3k / 1 1 
^' 1+ 2 + 4 


n 



(In this range, 2 k /n <C 2L 1 s n J- 1 /n ^ j.) 

Therefore we can apply the three-step method just described, with 


ciic (tv) = ln(n + 2 k )/k! , 

b k (n) = (Inn + 2 k /n — 4 k /2n 2 )/k! , 

Ck(n) = 8 k /n 3 k! ; 

D n = {0,1,..., [lg nj — 1} , 

T n = {LlgnJ,LlgnJ +1,...}. 


All we have to do is find good bounds on the three L’s in ( 9 . 63 ), and we’ll 
know that Y_k^o a k(k 0 ~ Lk^o b k(n). 

The error we have committed in the dominant part of the sum, L c (n) = 
Hk 6 D 8 k /n 3 k!, is obviously bounded by jT k>0 8 k /n 3 k! = e 8 /n 3 , so it can 
be replaced by 0(n~ 3 ). The new tail error is 


|lb(n)| = 

Z bk ( n ) 


k^ [lg nj 


Inn + 2 k + 4 k 


k! 


< z 

kjs [lg nj 

Inn + 2 bg n -l + 4L lgn J 4 k 

< ^ ^k! 

k^O 


LigTTj! 
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“We may not be big, 
but we’re small.” 


Since [lg nj ! grows faster than any power of n, this minuscule error is over- 
whelmed by L c (n) = 0(n~ 3 ). The error that comes from the original tail, 


n 


= X- Qk f n ) 

[lg tvJ 


< 


L 

k^ [lg n.J 


k + Inn 
k! 


is smaller yet. 

Finally, it’s easy to sum £^ k>0 b k (n) in closed form, and we have obtained 
the desired asymptotic formula: 


L 


ln(n + 2 k ) 
k! 


e 2 e 4 

e In n H — -j 

n 2n^ 


O 



The method we’ve used makes it clear that, in fact, 


L 


ln(n + 2 k ) 

id 


m— 1 

elnn + ^(-l ) k+1 

k=1 



kn k 



(9-65) 


(9.66) 


for any fixed m > 0. (This is a truncation of a series that diverges for all 
fixed n if we let m — » 00.) 

There’s only one flaw in our solution: We were too cautious. We de- 
rived (9.64) on the assumption that k < |_lg rvj , but exercise 53 proves that 
the stated estimate is actually valid for all values of k. If we had known 
the stronger general result, we wouldn’t have had to use the two-tail trick; 
we could have gone directly to the final formula! But later we’ll encounter 
problems where exchange of tails is the only decent approach available. 


9.5 EULER’S SUMMATION FORMULA 

And now for our next trick — which is, in fact, the last important 
technique that will be discussed in this book — we turn to a general method of 
approximating sums that was first published by Leonhard Euler [101] in 1732. 
(The idea is sometimes also associated with the name of Colin Maclaurin, a 
professor of mathematics at Edinburgh who discovered it independently a 
short time later [263, page 305].) 

Here’s the formula: 


X_ f oo 

a^kcb 


where R m 


f(x) dx 


m _ 

V f^f(k-l) 
k! 

k=l 


(9-67) 


(-l ) m+1 


P b 

a 


Bm({x}) 

m! 


f ,m) (x) dx , 


integers a J b; 
integer m ^ 1 . ' 
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On the left is a typical sum that we might want to evaluate. On the right is 
another expression for that sum, involving integrals and derivatives. If f (x) is 
a sufficiently “smooth” function, it will have m derivatives f '(x), . . . , f (x), 
and this formula turns out to be an identity. The right-hand side is often an 
excellent approximation to the sum on the left, in the sense that the remain- 
der R m is often small. For example, we’ll see that Stirling’s approximation 
for n! is a consequence of Euler’s summation formula; so is our asymptotic 
approximation for the harmonic number H n . 

The numbers Bic in (9.67) are the Bernoulli numbers that we met in 
Chapter 6; the function B m ({x}) in (9.68) is the Bernoulli polynomial that we 
met in Chapter 7. The notation {x} stands for the fractional part x — [xj , as 
in Chapter 3. Euler’s summation formula sort of brings everything together. 

Let’s recall the values of small Bernoulli numbers, since it’s always handy 
to have them listed near Euler’s general formula: 

B 0 = 1 , Bt = — | , B 2 = \ , B 4 = — 3L , B 6 = = —jo ; 

B3 = B5 = B7 = B9 = Bn = ••• = 0. 

Jakob Bernoulli discovered these numbers when studying the sums of powers 
of integers, and Euler’s formula explains why: If we set f (x) = x m_1 , we have 
f(m) j x ) = 0; hence R m = 0, and (9.67) reduces to 
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Setting e = 1 tells us that 


Af(x) = f(x + 1)-f(x) 

= f'(x)/1 ! + f"(x)/2! + f"'(x)/3! + ■ ■ ■ 

= (D/1 ! + D 2 /2! + D 3 /3! H ) f (x) = (e D -1)f(x). ( 9 . 69 ) 

Here e D stands for the differential operation 1 + D/1 ! + D 2 /2! + D 3 /3! + • • • . 
Since A = e D — 1, the inverse operator L — 1/A should be l/(e D — 1); and 
we know from Table 352 that z/(e z — 1) = 2Ik>o BkZ k /k! is a power series 
involving Bernoulli numbers. Thus 


Bo Bi B2 _ B3 -> P v — Bk_ 

^ D 1! 2! 3! J k! 


k nk— 1 


(9-7°) 


k> 1 


Applying this operator equation to f (x) and attaching limits yields 


y~ f(x)6x = f(x)dx+ V~ 

L — n L — k! 




k>l 


(9-71) 


which is exactly Euler’s summation formula ( 9 . 67 ) without the remainder 
term. (Euler did not, in fact, consider the remainder, nor did anybody else 
until S. D. Poisson [295] published an important memoir about approximate 
summation in 1823. The remainder term is important, because the infinite 
sum £/k>i ( B k/1<3)^ k ^Mla often diverges. Our derivation of ( 9 . 71 ) has 
been purely formal, without regard to convergence.) 

Now let’s prove ( 9 . 67 ), with the remainder included. It suffices to prove 
the case a = 0 and b = 1 , namely 


f(0) = 


f(x) dx+^ ^f (k - 

k=l 


-Ml 


B m (x) 

m! 


f (m, (x)dx, 


because we can then replace f(x) by f (x + 1 ) for any integer l, getting 


f(D = 


1+1 


f(x)dx + ^^f ( 


k=1 


1+1 


-(-r 


1+1 


Bm({x}) 

m! 


f ,m) (x) dx. 


The general formula ( 9 . 67 ) is just the sum of this identity over the range 
a ^ l < b, because intermediate terms telescope nicely. 

The proof when a = 0 and b = 1 is by induction on m, starting with 


m = 1: 
f(0) 


fi 

0 


f(x) dx 


l(f(l) — f (0)) + 



l)f'(x) dx. 
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(The Bernoulli polynomial B m (x) is defined by the equation 
|B 0 x 


BmW = ( ™jB 0 x m + 


m 


m 


B m x° 


( 9 - 72 ) 


in general, hence Bi (x) = x — \ in particular.) In other words, we want to 
prove that 


f(0) + f(1) 
2 


f(x) dx + 


(x — l)f'(x) dx . 


But this is just a special case of the formula 


u(x)v(x) n = 


u(x) dv(x) + 


v(x) du(x) 


( 9 - 73 ) 


for integration by parts, with u(x) = f(x) and v(x) = x — 7. Hence the case 
m = 1 is easy. 

To pass from m - 1 to m and complete the induction when m > 1 , we 
need to show that R m _i = (B m /m!)f* m ~ 1 *(x)|J + R m , namely that 


(-V 


Em— 1 (x) ^(m— 1 ) 


(m — 11! 


f |m ~ ' (x) dx 


= -^f (m ^ 1) (x) -(-i: 
m! 0 

This reduces to the equation 


^V>(x)dx. 

m! 


-l) m B m f (m - 1 ’(x) 


= m 


B m -i(x)f (m 1 * (x) dx + 


B m (x)f m (x) dx 


Once again (9.73) applies to these two integrals, with u(x) = f^ m ’*(x) and 
v(x) = B m (x), because the derivative of the Bernoulli polynomial (9.72) is 




dx z — \ k 

k 


m— k 


Y_ (^jlm-kjBkX 


m— k— 1 


= m 


m— 1 


B k x m k = mB m _i (x) . (9.74) 


(The absorption identity (5.7) was useful here.) Therefore the required for- 
mula will hold if and only if 

(- 1 ) m B m f (m - 1) (x)|I = B m (x)f (m - 1) (x)ll. 


Will the authors 
never get serious? 
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In other words, we need to have 


(— 1) m B m = B m (1) = B m (0), for m > 1 . 


(9-75) 


This is a bit embarrassing, because B m (0) is obviously equal to B m , not 
to (— 1) m B m . But there’s no problem really, because m > 1; we know that 
B m is zero when m is odd. (Still, that was a close call.) 

To complete the proof of Euler’s summation formula we need to show 
that B m (l) = B m (0), which is the same as saying that 



for m > 1 . 


But this is just the definition of Bernoulli numbers, (6.79), so we’re done. 
The identity B/Jx) = mB m _i (x) implies that 

f 1 ,, , , , B m+1 (1 ) — B m+ i (0) 

Jo m+ 1 

and we know now that this integral is zero when m 1 . Hence the remainder 
term in Euler’s formula, 


R 


m 



b B m ({x})f (m '(x) dx, 

a 


multiplies f ,m '(x) by a function B m ({x}) whose average value is zero. This 
means that R m has a reasonable chance of being small. 

Let’s look more closely at B m (x) for 0 5! x 5 1, since B m (x) governs the 
behavior of R m . Here are the graphs for B m (x) for the first twelve values of m: 



'8+m 


(x) 




Although B3 (x) through B$i(x) are quite small, the Bernoulli polynomials 
and numbers ultimately get quite large. Fortunately R m has a compensating 
factor 1/m!, which helps to calm things down. 
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The graph of B m (x) begins to look very much like a sine wave when 
m )> 3 ; exercise 58 proves that B m (x) can in fact be well approximated by a 
negative multiple of cos( 27 tx — jnvn), with relative error 1 / 2 m . 

In general, B4 k+ i (x) is negative for 0 < x < j and positive for 1 < x < 1 . 
Therefore its integral, B4 k+2 (x)/(4k+2), decreases for 0 < x < 1 and increases 
for 1 < x < 1 . Moreover, we have 

B4k+id-x) = — B 4 ic + i (x) , forO^x^l, 

and it follows that 

B4k+2d-x) = B 4k+2 (x), forO^x^l. 

The constant term B4 k+2 causes the integral B4 k+2 (x) dx to be zero; hence 
B4k+2 > 0 . The integral of B4 k+2 (x) is B4 k+ 3(x)/(4k + 3 ), which must there- 
fore be positive when 0 < x < j and negative when j < x < 1 ; furthermore 
B4k+3 (1 — x) = — B4 k+ 3(x), so B4 k+ 3(x) has the properties stated for B4 k+ i (x), 
but negated. Therefore B4 k+ 4(x) has the properties stated for B4 k+2 (x), but 
negated. Therefore B4 k+ 5(x) has the properties stated for B4 k+ i (x); we have 
completed a cycle that establishes the stated properties inductively for all k. 

According to this analysis, the maximum value of B 2m (x) must occur 
either at x = 0 or at x = Exercise 17 proves that 


B 2m (j) = (2 1 - 2m -l)B 2m ; 

( 9 - 76 ) 

hence we have 


|B 2m ({x})| < |B 2m |. 

( 9 - 77 ) 


This can be used to establish a useful upper bound on the remainder in Euler’s 
summation formula, because we know from (6.89) that 

j^ 2n ^ = — V~ —2 — = 0 (( 27 tp 2m ) , when m > 0. 

( 2 m)! ( 27 t) 2m k 2m v J 

Therefore we can rewrite Euler’s formula (9.67) as follows: 


Y_ w 

a^k<b 


f b 

f(x) dx 

a 



+ 0 (( 27 t)- 2m ) 


'b 

|f (2m) (x)| dx. 

a 


(9-78) 


For example, if f(x) = e x , all derivatives are the same and this formula tells 
us that La^k<b e k = (e b - e a )(l - \ + B 2 / 2 ! + B 4 / 4 ! + • • ■ + B 2m /( 2 m)!) + 
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0([2n) 2m ). Of course, we know that this sum is actually a geometric series, 
equal to (e b - e Q )/(e - 1 ) == (e b - e a ) (T k ;> 0 B k /k!. 

If f^ 2m *(x) 0 for a ^ x $$ b, the integral J b |f( 2m ) ( x )| dx is just 

f(2 m-i ) | b , so we have 


|R2r 


< 


D 2m 

(2m)! 


f(2m- 


in other words, the remainder is bounded by the magnitude of the final term 
(the term just before the remainder), in this case. We can give an even better 
estimate if we know that 


f (2m+2) (x) £ 0 and f (2m+4) (x) 7> 0, for a < x <C b. (9.79) 
For it turns out that this implies the relation 

k 2 m = 8m (2^,'+ + 2)! fUm+1 ^ X ^ ’ for some 0 < 0 m < 1; (9.80) 

in other words, the remainder will then lie between 0 and the first discarded 
term in (9.78) — the term that would follow the final term if we increased m. 

Here’s the proof: Euler’s summation formula is valid for all m, and 
B2m+i = 0 when m > 0; hence R2 m = R2m+i) and the first discarded term 
must be 


R 2 m k2m+2 • 


We therefore want to show that R2 m lies between 0 and R2 m — R2m+2! and 
this is true if and only if R2 m and R2m+2 have opposite signs. We claim that 

f (2m+2) Q for a ^ x ^ b implies (— l) m R2m S? 0. (9.81) 

This, together with (9.79), will prove that R2 m and R2m+2 have opposite 
signs, so the proof of (9.80) will be complete. 

It’s not difficult to prove (9.81) if we recall the definition of R2m+i and 
the facts we proved about the graph of B2 m +i (x). Namely, we have 


R2m 


k2m+1 


f b 

a 


B2m+i (M) 

(2m + 1 )! 


f (2m+1) (x) dx, 


and f ,2m+1 '(x) is increasing because its derivative f ,2m+2 '(x) is positive. 
(More precisely, f |2m+1 *(x) is nondecreasing because its derivative is non- 
negative.) The graph of B2 m +i ({x}) looks like (— l) m+1 times a sine wave, so 
it is geometrically obvious that the second half of each sine wave is more influ- 
ential than the first half when it is multiplied by an increasing function. This 
makes (— l) m R2m+i 0, as desired. Exercise 16 proves the result formally. 
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9.6 FINAL SUMMATIONS 


Now comes the summing up, as we prepare to conclude this book. 
We will apply Euler’s summation formula to some interesting and important 
examples. 


Summation 1: This one is too easy. 

But first we will consider an interesting unimportant example, namely 
a sum that we already know how to do. Let’s see what Euler’s summation 
formula tells us if we apply it to the telescoping sum 


S 


n 


L 

1 ^k<n 


1 

k(k + 1 ) 



1 


1 

n ' 


It can’t hurt to embark on our first serious application of Euler’s formula with 
the asymptotic equivalent of training wheels. 

We might as well start by writing the function f (x) = 1 /x(x+l ) in partial 
fraction form, 


f(x) 


1 1 

x _ UTT ’ 


since this makes it easier to integrate and differentiate. Indeed, we have 

f '(x) = — 1/x 2 + 1 /(x + 1 ) 2 and f"(x) = 2/x 3 — 2 /(x+ 1 ) 3 ; in general 


f'-'M = forlOO. 


Furthermore 


f(x) dx = lnx — ln(x + 1 ) I™ = In 

! 1 1 n + 1 


Plugging this into the summation formula (9.67) gives 

S n = In — v - X (_1 ) k T“ ( nk ~~ ( n+ i)k “ 1 + Y7 ) + R m(n) , 

1 


TV+1 


k=l 


1 

2 k 


where R m (n) = — 


B, 




(x+ l) m+1 


dx . 


For example, the right-hand side when m = 4 is 


In 


2 n 


1(1— L_ 

2 Vn n + 1 


1 

2 

+ 


-1(1- 1 1 

12 \n 2 (n+1) 2 4 J 

1 / 1 1 15 \ 

120 _ (nTT]4 ^ T6 J + R4 * n) ‘ 
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This is kind of a mess; it certainly doesn’t look like the real answer 1 — rU 1 . 
But let’s keep going anyway, to see what we’ve got. We know how to expand 
the right-hand terms in negative powers of n up to, say, 0(rU 5 ): 


ln- 


u + 1 

1 

n + 1 

1 

(n+1) 2 

1 


= — n 


n- 1 - u- 2 


ln- 2 -ln- 3 + lu- 4 + 0(n- 5 ); 

rU 3 - n+ 4 + 0(n+ 5 ) ; 
n+ 2 — 2n+ 3 + 3n~ 4 + 0(n+ 5 ); 

u- 4 + 0(n- 5 ). 


(n+1) 4 ' 

Therefore the terms on the right of our approximation add up to 


ln2+ 4 + ,6 128 + ( ^ 2 + l ) 71 + (2 2 12 + u ) 74 

+ (-3 + 2 - ir) n_3 + (?~I + T+ + T+o~ 1T0 ) 74 4 + R 4(n) 

= In 2 + jjg — + R 4 (n) + 0(ru 5 ) . 

The coefficients of n~ 2 , rU 3 , and rU 4 cancel nicely, as they should. 

If all were well with the world, we would be able to show that R 4 (u) is 
asymptotically small, maybe 0(rU 5 ), and we would have an approximation 
to the sum. But we can’t possibly show this, because we happen to know that 
the correct constant term is 1, not In 2+ (which is approximately 0.9978). 
So R 4 (u) is actually equal to jjg — In 2 + 0(rU 4 ), but Euler’s summation 
formula doesn’t tell us this. 

In other words, we lose. 

One way to try fixing things is to notice that the constant terms in the 
approximation form a pattern, if we let m get larger and larger: 

ln2-lB 1 +l.|B 2 -HB 3 + l-]|B4-l-ilB 5 +.... 


Perhaps we can show that this series approaches 1 as the number of terms 
becomes infinite? But no; the Bernoulli numbers get very large. For example, 
B 22 = 85 | 4 s 5 g 1 3 > 6192; therefore | R 22 (tl) | will be much larger than |R 4 (u)|. 
We lose totally. 

There is a way out, however, and this escape route will turn out to be 
important in other applications of Euler’s formula. The key is to notice that 
R 4 (n) approaches a definite limit as n -> 00 : 


lim R 4 (n) 

TL — ^OO 


•°° /] 1 \ 

, B -'( w) (?-(2TT|s) dx = R4|oo) - 
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The integral B m ({x})f* m ^(x) dx will exist whenever f^ m *(x) = 0(x 2 ) as 
x — > oo, and in this case f^ 4 *(x) surely qualifies. Moreover, we have 


R.i in) = R 4 (oo) + 


B 4 ({x})(1 1 


dx 


^x 5 (x+ 1 ) 5 - 

/ r°° \ 

= R 4 (oo) + 0( x ~ 6 dx) = R 4 (oo) + 0(ri~ 5 ) . 

'Jn ' 

Thus we have used Euler’s summation formula to prove that 

1 


L 

1 <k<n 


k(k+r 


= ln2 + ^ — n 1 + R 4 (oo) + 0(n 5 ) 


= C — n 


0 (n 


for some constant C. We do not know what the constant is — some other 
method must be used to establish it — but Euler’s summation formula is able 
to let us deduce that the constant exists. 

Suppose we had chosen a much larger value of m. Then the same rea- 
soning would tell us that 

R m (n) = R m (oo) + 0(nT m ~ 1 ) , 


and we would have the formula 

= C — nT 1 +c 2 rir 2 +C 3 nr 3 + - • + c m nT m + 0(riT m ~ 1 ) 

for certain constants c 2 , C 3 , .... We know that the c’s happen to be zero 
in this case; but let’s prove it, just to restore some of our confidence (in 
Euler’s formula if not in ourselves). The term In contributes (— 1) m /m 
to c m ; the term (— 1 ) m+1 (B m /m.)n~ m contributes (— l) m+ 1 B m /m; and the 
term (— 1 ) k (Bi c /k)(n + 1)~ k contributes (— 1 ^(^T^Bic/k. Therefore 


L 

1 <k<n 


1 


k(k+l) 


M) m c m = 



m 


(1 -B m + B m (l) 


!)■ 


Sure enough, it’s zero, when m > 1 . We have proved that 


L 

1 §k<n 


1 

k(k+ 1) 


C-n- 1 +0(n- m - 1 ), 


for all m 1 . 


(9.82) 


This is not enough to prove that the sum is exactly equal to C — n 1 ; the 
actual value might be C — nc 1 + 2~ n or something. But Euler’s summation 
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formula does give us the error bound 0(n m 1 ) for arbitrarily large m, even 
though we haven’t evaluated any remainders explicitly. 

Summation 1 , again: Recapitulation and generalization. 

Before we leave our training wheels, let’s review what we just did from 
a somewhat higher perspective. We began with a sum 

Sn = £_ f(k) 

1 ^k<rv 

and we used Euler’s summation formula to write 

m 

S n = F(n) — F(1) + ^(T k (n)-T k (1))+R m (n), (9.83) 

k=1 

where F(x) was J f(x) dx and where T k (x) was a certain term involving B k 
and f ,k ~ 1 '(x). We also noticed that there was a constant c such that 

f (m) (x) = 0(x c ~ m ) as x — > 00, for all large m. 


(Namely, f(k) was 1 /k(k + 1 ); F(x) was ln(x/(x + 1 )); c was — 2; and T k (x) 
was (— 1) k+1 (B k /k)(x~ k — (x+ 1)~ k ).) For all large enough values of m, this 
implied that the remainders had a small tail, 


R' (n) = R r 


= (-1 


00) - R m (n) 

•°° B m ({x}) 


\m+1 


rt 


m! 


f (m) (x) dx = 0(n c+1 - m ) 


(9.84) 


Therefore we were able to conclude that there exists a constant C such that 


S n = F(n) + C + ^T k (n)-R^(n). 

k=1 


( 9 - 85 ) 


(Notice that C nicely absorbed the T k (1 ) terms, which were a nuisance.) 

We can save ourselves unnecessary work in future problems by simply 
asserting the existence of C whenever R m (oo) exists. 

Now let’s suppose that f (2m+2) ;> q anc j f(2m+4) ^ 0 £ or -j <; x <; n 

We have proved that this implies a simple bound (9.80) on the remainder, 


k2m(l4) — Sm.n (”F2 ttl+ 2 ”F2m+2(^)) > 


where 0 m ,n lies somewhere between 0 and 1 . But we don’t really want bounds 
that involve R2m(n-) and T2 m +2(1); after all, we got rid of T k (l) when we 
introduced the constant C. What we really want is a bound like 


— 4 ) m,nT2m+2 (Tl) 
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where 0 < c^m.n < 1; this will allow us to conclude from (9.85) that 

m 

S n = F(n) + C + Ti(n) + ^T 2 k(n) + 4) m , n T2 m+ 2(n), (9.86) 

k=1 

hence the remainder will truly be between zero and the first discarded term. 

A slight modification of our previous argument will patch things up per- 
fectly. Let us assume that 

f< 2m+2 )(x) ^ 0 and f (2m+4, (x) ^ 0, as x — > 00. (9-87) 


The right-hand side of (9.85) is just like the negative of the right-hand side of 
Euler’s summation formula (9.67) with a = n and b = 00, as far as remainder 
terms are concerned, and successive remainders are generated by induction 
on m. Therefore our previous argument can be applied. 

Summation 2 : Harmonic numbers harmonized. 

Now that we’ve learned so much from a trivial (but safe) example, we can 
readily do a nontrivial one. Let us use Euler’s summation formula to derive 
the approximation for H n that we have been claiming for some time. 

In this case, f (x) = 1 /x. We already know about the integral and deriva- 
tives of f, because of Summation 1; also f ,m *(x) = 0(x _m_1 ) as x — > 00. 
Therefore we can immediately plug into formula (9.85): 


L 

1 $k<n 


1 

k 


In n + C + B 1 n 1 


m 


L 


B2k 

2kn 2k 


R2ml 


n 


for some constant C. The sum on the left is H n _j, not H n ; but it’s more 
convenient to work with H n _i and to add 1 /n later, than to mess around with 
(n + 1 )’s on the right-hand side. The Bi riC 1 will then become (Bi + 1 )n _1 = 
l/(2n). Let us call the constant y instead of C, since Euler’s constant y is, 
in fact, defined to be limr^oojHn — Inn). 

The remainder term can be estimated nicely by the theory we developed 
a minute ago, because f ,2m ^(x) = (2m)!/x 2m+1 Jr 0 for all x > 0. Therefore 
(9.86) tells us that 


H 


ri 


In n + y + 


1 

2n 


m 


L 


£>2k B2m+2 

2 kn^ m ’ n (2m + 2)n 2m + 2 ’ 


(9.88) 


where 0 m>n is some fraction between 0 and 1. This is the general formula 
whose first few terms are listed in Table 452. For example, when m = 2 we get 


H 


TL 


In n + y + 


1 

2n 


1 ( 1 

12u 2 + 120u 4 


02, n 

252n 6 ' 


(9-89) 
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This equation, incidentally, gives us a good approximation to y even when 
n = 2: 


y — H2 — In 2 — — 1 920 ^ — 0.5771 65 . . . + e , 

where e is between zero and 1 6 ] 2g . If we take n = 1 0 4 and m = 250, we get 
the value of y correct to 1271 decimal places, beginning thus [205]: 

y = 0.57721 56649 01532 86060 65120 9008240243... . (9.90) 


Heisenberg may 
have been here. 


But Euler’s constant appears also in other formulas that allow it to be eval- 
uated even more efficiently [345]. 

Summation 3: Stirling’s approximation. 

If f(x) = lnx, we have f'(x) = 1/x, so we can evaluate the sum of 
logarithms using almost the same calculations as we did when summing re- 
ciprocals. Euler’s summation formula yields 


Ink 

1 ^k<n 


nlnn — n + cr — 


Inn 

~2~ 




k=1 


f>2k 

2k(2k— 1 )n 2k -! 


+ (fm.n 


B2m+2 

(2m+2)(2m+l )n 2m+1 


where cr is a certain constant, “Stirling’s constant,” and 0 < cp m>n < 1. (In 
this case f l 2m l (x) is negative, not positive; but we can still say that the 
remainder is governed by the first discarded term, because we could have 
started with f(x) = — lnx instead of f(x) = lnx.) Adding Inn to both sides 
gives 


Inn! 


, Inn 1 

nlnn - n + — b cr + — 

2 12n 


1 

360n 3 


<P2,n 

1 260n 5 


(9-9i) 


when m = 2. And we can get the approximation in Table 452 by taking ‘exp’ 
of both sides. (The value of e CT turns out to be Vln, but we aren’t quite ready 
to derive that formula. In fact, Stirling didn’t discover the closed form for cr 
until several years after de Moivre [76] had proved that the constant exists.) 

If m is fixed and n — > 00, the general formula gives a better and better 
approximation to Inn! in the sense of absolute error, hence it gives a better 
and better approximation to n! in the sense of relative error. But if n is fixed 
and m increases, the error bound |B2 m +2l/(2rn + 2)(2m+ 1)n 2m+1 decreases 
to a certain point and then begins to increase. Therefore the approximation 
reaches a point beyond which a sort of uncertainty principle limits the amount 
by which n! can be approximated. 



482 ASYMPTOTICS 


In Chapter 5, equation (5.83), we generalized factorials to arbitrary real a. 
by using a definition 


— - = lim 

Oil TV— >00 


n + a 
n 


n 


suggested by Euler. Suppose oc is a large number; then 

n 

In a! = lim ( a Inn + Inn! — Y" ln(a + k)), 

TV— ^OO \ ^ / 


k=1 


and Euler’s summation formula can be used with f(x) = ln(x+ a) to estimate 
this sum: 


ln(k + a) 

k=1 

F m (a,x) 


R2m (a,n) 


F m (a,n) - F m (a, 0) + R 2m (a,n) , 
(x + a) ln(x + a) - x + lnt * 2 + a) 


2k(2k-1)(x + exp- 1 ’ 

,n B 2m ({x}) dx 
0 2m (x + a) 2m ' 


(Here we have used (9.67) with a = 0 and b = n, then added ln(n + a) — 
In a to both sides.) If we subtract this approximation for Xlk=i l n (k + a ) 
from Stirling’s approximation for Inn!, then add a Inn and take the limit as 
n — * 00, we get 


, , , In a 

m a! = a In a — a 4 — — — F cr 


k=1 


B 2 k 

(2k) (2k— 1)a 2k_1 


'°° B 2m ({x}) dx 
0 2m (x + a) 2m ’ 


because alnn+nlnn— n+^ Inn— (n+a) ln(n+a)+n— \ ln(n+a) — > —a and 
the other terms not shown here tend to zero. Thus Stirling’s approximation 
behaves for generalized factorials (and for the Gamma function F(a + 1 ) = a!) 
exactly as for ordinary factorials. 

Summation 4 : A bell-shaped summand. 

Let’s turn now to a sum that has quite a different flavor: 

Bn = X. e ~ k2/n (9.92) 

k 

= • • - + e~ 9/n + e~ 4/n + e~ 1/n + 1 +e~ 1/n + e~ 4/n + e~ 9/n H . 
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This is a doubly infinite sum, whose terms reach their maximum value e° = 1 
when k = 0. We call it 0 n because it is a power series involving the quantity 
e -Vn ra i se d to the p(k)th power, where p(k) is a polynomial of degree 2; 
such power series are traditionally called “theta functions.” If n = 10 1o °, we 
have 

, r e-- 01 « 0.99005, when k = 10 49 ; 

e ~ k /n ~ i e _1 « 0.36788, when k = 1 0 50 ; 

L e 100 < 10 43 , when k = 1 0 5 1 . 


So the summand stays very near 1 until k gets up to about y/n, when it 
drops off and stays very near zero. We can guess that 0 n will be proportional 
to y/n. Here is a graph of e~ k when n = 1 0: 



Larger values of n just stretch the graph horizontally by a factor of y/n. 

We can estimate 0 n by letting f(x) = and taking a = — oo, 

b = +oo in Euler’s summation formula. (If infinities seem too scary, let 
a = —A and b = +B, then take limits as A, B — ) oo.) The integral of f (x) is 

' + 00 P + OO 

e -x 2 /n dx = ^/rv du = v^C, 

J— oo J oo 

if we replace x by Uy/n. The value of J e~ u du is well known, but we’ll 
call it C for now and come back to it after we have finished plugging into 
Euler’s summation formula. 

The next thing we need to know is the sequence of derivatives f'(x), 
f"(x), . . . , and for this purpose it’s convenient to set 

f (x) = g(x/ v / n), g(x) = 

Then the chain rule of calculus says that 

df(x) _ dg(p) dy _ x 

dx dy dx ’ ^ y/n ’ 

and this is the same as saying that 

f'(x) = -j=g'(x/Vn). 


By induction we have 

f (k) (x) = n _k/2 g ,k) (x/a/tt) . 
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2 ~ 2 

For example, we have g'(x) = — 2xe~ x and g"(x) = (4x 2 — 2)e~ x ; hence 



It’s easier to see what’s going on if we work with the simpler function g(x). 

We don’t have to evaluate the derivatives of g(x) exactly, because we’re 
only going to be concerned about the limiting values when x = ±oo. And for 
this purpose it suffices to notice that every derivative of g(x) is e~ x ~ times a 
polynomial in x: 

g |k *(x) = P] C (x)e _x , where P^ is a polynomial of degree k. 

This follows by induction. 

The negative exponential e~ x goes to zero much faster than Pk(x) goes 
to infinity, when x — > ±oo, so we have 

f (k) (+oo) = f lk) (-oo) = 0 


for all k > 0. Therefore all of the terms 



+oo 

— OO 


vanish, and we are left with the term from J f (x) dx and the remainder: 


@n = 


— C\/t\ + 

= C\/t\ + 

= C\/n + 0(n 


(-1) 

m+l 

+oo 

Bm(M) 

■ 

— OO 

m! 

l-r 

|tn+l 

' + 00 

Bm({x}) 

n m /2 J 

— OO 

m! 

(-1 

jm+1 

' + 00 

B m ({^•'V 

n (m 

-U/2 i 

— OO 

m! 


f ,m) (x) dx 


,(m) 


dx 


(x = Uy/n) 


-P m (u)e u du 


( 1 — m )/2 1 


The O estimate here follows since |B m ({u\/n}) | is bounded and the integral 
J+“|P( U )|e- u du exists whenever P is a polynomial. (The constant implied 
by this O depends on m.) 

We have proved that 0 n = C^/n+ 0(n~ M ), for arbitrarily large M; the 
difference between 0 n and Cy/n is “exponentially small.” Let us therefore 
determine the constant C that plays such a big role in the value of 0 n . 

One way to determine C is to look the integral up in a table; but we 
prefer to know how the value can be derived, so that we can do integrals even 
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when they haven’t been tabulated. Elementary calculus suffices to evaluate C 
if we are clever enough to look at the double integral 


c- = 


'+00 

r+°° 

'+00 

dx 

TO 

1 

& 

II 


J —OO 

— OO 

—00 * 


e (x +y ' dx dp 


Converting to polar coordinates gives 


C 2 


'2tt 

0 


'OO 

0 


e r rdrdd 


1 

2 

1 

2 




'271 

0 


d0 = 7t. 


(u = r 2 ) 


So C = yfii. The fact that x 2 + y 2 = r 2 is the equation of a circle whose 
circumference is 27tr somehow explains why 7 t gets into the act. 

Another way to evaluate C is to replace x by \/t and dx by dt: 


C = 


'+00 

*00 

e~* 2 dx = 2 

e - * 2 dx = 

J — OO 

0 


t~ 1 / 2 e _t 


dt . 


This integral equals rQ), since T(cx) = t“ _ 1 e _t dt according to ( 5 . 84 ). 

Therefore we have demonstrated that fQ) = \Jtl. 

Our final formula, then, is 

0 n = ^ e _k2/n = + 0(n~ M ) , for all fixed M. ( 9 - 93 ) 

k 


The constant in the O depends on M; that’s why we say that M is “fixed.” 

When n = 2, for example, the infinite sum 02 is approximately equal to 
2.506628288; this is already very close to y/2n « 2.506628275, even though n 
is quite small. The value of 0i 00 agrees with 1 0 \Jn to 427 decimal places! Ex- 
ercise 59 uses advanced methods to derive a rapidly convergent series for 0 n ; 
it turns out that 


©n/v/Ttn = 1 +2e- nnl + 0 (e- W ). 


(9-94) 


Summation 5: The clincher. 

Now we will do one last sum, which will turn out to tell us the value 
of Stirling’s constant a. This last sum also illustrates many of the other 
techniques of this last chapter (and of this whole book), so it will be a fitting 
way for us to conclude our explorations of Concrete Mathematics. 
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The final task seems almost absurdly easy: We will try to find the asymp- 
totic value of 



by using Euler’s summation formula. 

This is another case where we already know the answer (right?); but 
it’s always interesting to try new methods on old problems, so that we can 
compare facts and maybe discover something new. 

So we think big and realize that the main contribution to A n comes 
from the middle terms, near k = n. It’s almost always a good idea to choose 
notation so that the biggest contribution to a sum occurs near k = 0, because 
we can then use the tail-exchange trick to get rid of terms that have large |k|. 
Therefore we replace k by n + k: 

a = v f 2u = y t 2n ) ! 

\n + kj (n + k)!(n-k)! ' 


Things are looking reasonably good, since we know how to approximate 
(n ± k)! when n is large and k is small. 

Now we want to carry out the three-step procedure associated with the 
tail-exchange trick. Namely, we want to write 


(2n)! 

(n + k)! (n-k)! 


a k (n) = b k (n) + 0(c k (n)) , 


for k G D n , 


so that we can obtain the estimate 

An = ^b k (n) + 0( Y_ a k(TL)) + o( Y_ b k( n )) + Y °( C k( n )) • 

k k^Dn k^D n keD„ 

Let us therefore try to estimate ( n 2 y k ) in the region where |k| is small. We 
could use Stirling’s approximation as it appears in Table 452, but it’s easier 
to work with the logarithmic equivalent in (9.91): 

lna k (n) = ln(2ri)! — ln(n + k)! — ln(n — k)! 

= 2nln2n — 2n + ^ In 2n + cr + 0(n _1 ) 

— (n+k) ln(n+k) + n + k — \ ln(n+k) — cr + 0((n+k) _1 ) 

— (n— k) ln(n— k) + n — k — \ ln(n— k) — a + 0((u— k) _1 ) . 

(9-95) 


We want to convert this to a nice, simple O estimate. 

The tail-exchange method allows us to work with estimates that are valid 
only when k is in the “dominant” set D n . But how should we define D n ? 
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Actually I’m not 
into dominance. 


We have to make D n small enough that we can make a good estimate; for 
example, we had better not let k get near n, or the term 0((n — k) -1 ) in 
(9.95) will blow up. Yet D n must be large enough that the tail terms (the 
terms with k ^ D n ) are negligibly small compared with the overall sum. Trial 
and error is usually necessary to find an appropriate set D n ; in this problem 
the calculations we are about to make will show that it’s wise to define things 
as follows: 


k G D n «=* |k| 4 n’/ 2+e . (9.96) 

Here e is a small positive constant that we can choose later, after we get to 
know the territory. (Our O estimates will depend on the value of e.) Equation 
(9.95) now reduces to 


lna k (n) = ( 2 n + j) In 2 — cj — \ Inn + 0 (n 1 ) 

- (n+k+j) ln( 1 +k/n) — (n-k+j) ln(l-k/n) . ( 9 - 97 ) 

(We have pulled out the large parts of the logarithms, writing 

ln(n±k) = Inn + ln (1 ± k/n) , 


and this has made a lot of Inn terms cancel out.) 

Now we need to expand the terms ln(l ± k/n) asymptotically, until we 
have an error term that approaches zero as n -! 00. We are multiplying 
ln(l ±k/n) by (n±k+ j), so we should expand the logarithm until we reach 
o(n _1 ), using the assumption that |k| sC n 1 ^ 2+e : 


In 1 ± 


n 


k k 2 
“ ± n "" In 2 


Ofn 


-3/2+3ei 


Multiplication by n ± k - 


\ yields 


±k — 


k 2 


2n 


n 


+ 0(n 


-l/2+3e 1 


plus other terms that are absorbed in the 0(n 1 / 2+3e ). So ( 9 . 97 ) becomes 
lna k (n) = ( 2 n + -) In 2 — c — ^ Inn — k 2 /n + 0 (n~ 1/2+3e ) . 

Taking exponentials, we have 

2 2n+1/2 , 

e _k /n (l + 0 (n- 1/2+3e )) . 


a k (n 


e^Wn 


(9-98) 
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This is our approximation, with 


bk(Tt) 


22n+1/2 

e°^/n 


e -k 2/ n 


Ck(n) 


_ 2 2n n ^ 1+3e e _k2/n 


Notice that k enters bk(ri) and Ck(n.) in a very simple way. We’re in luck, 
because we will be summing over k. 

The tail-exchange trick tells us that }[j k ak(n) be approximately 
)T k bk(ti) if we have done a good job of estimation. Let us therefore evaluate 


k 




-k 2 /n 


22n+1/2 


(Another stroke of luck: We get to use the sum 0 n from the previous exam- 
ple.) This is encouraging, because we know that the original sum is actually 

A n = L( 2 k n ) = n + n 2n = 2 2n . 


Therefore it looks as if we will have e a = y/2n, as advertised. 

But there’s a catch: We still need to prove that our estimates are good 
enough. So let’s look first at the error contributed by Ck(n.): 

I c (n) = Y_ 2 2n n- 1+3e e- k2/n <S 2 2n R- 1+3e 0n = 0(2 2n n-2 +3e ) . 

k|$n , / 2 + e 

Good; this is asymptotically smaller than the previous sum, if 3e < 

Next we must check the tails. We have 

Y_ e- k2/n < exp(— Ln 1/2+e J 2 /n) (1 + e~ 1/n + e~ 2/n + • • • ) 

k>n 1 / 2 + e 

= 0(e— 2e )-0(n), 

which is 0(n~ M ) for all M; so jTk^D t>k(fi) is asymptotically negligible. 
(We chose the cutoff at n 1 / 2+£ just so that e~ k A n would be exponentially 
small outside of D n . Other choices like n 1 / 2 logn would have been good 
enough too, and the resulting estimates would have been slightly sharper, 
but the formulas would have come out more complicated. We need not make 
the strongest possible estimates, since our main goal is to establish the value 
of the constant a.) Similarly, the other tail 



What an amazing 
coincidence. 


I’m tired of getting 
to the end of long, 
hard books and not 
even getting a word 
of good wishes from 
the author. It would 
be nice to read a 
“thanks for reading 
this, hope it comes 
in handy,” instead 
of just running into 
a hard, cold, card- 
board cover at the 
end of a long, dry 
proof. You know? 
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Thanks for reading 
this, hope it conies 
in handy. 

— The authors 

QED. 

Exercises 

Warmups 

1 Prove or disprove: If fi (n) gi(n) and f2(n) -< g 2 (n), then we have 
f i (n) + f 2 (n) -<: gi (n) + g 2 (n). 

2 Which function grows faster: 

a n (inn) or (l nn )Ti? 

b n (lnlnlnn) Qr ( lnn )t? 

c (n! ) ! or ((n — 1)!)! (n— 1)! n! ? 

d F?, , , or H f ? 

r n 

3 What’s wrong with the following argument? “Since n = O(n) and 2 n = 

O(n) and so on, we have kn = 2Ik=i O(n) = 0(n 2 ).” 

4 Give an example of a valid equation that has O-notation on the left but 
not on the right. (Do not use the trick of multiplying by zero; that’s too 
easy.) Hint: Consider taking limits. 

5 Prove or disprove: 0 (f(n) + g(n)) = f(n) + 0 (g(n)), if f(n) and g(n) 
are positive for all n. (Compare with (g.27).) 

6 Multiply (lnn + y + 0(1 /n)) by (n+ Ofv 7 ^ )), and express your answer 
in O-notation. 

7 Estimate £ 2 k>o e ~ k ^ n with absolute error 0 (n _1 ). 

Basics 

8 Give an example of functions f(n) and g(n) such that none of the three 
relations f(n) -<; g(n), f(n) >~ g(n), f(n) x g(n) is valid, although f(n) 
and g(n) both increase monotonically to 00. 


is bounded by 2n times its largest term, which occurs at the cutoff point 
k « n 1 / 2+e . This term is known to be approximately bk(n), which is ex- 
ponentially small compared with A n ; and an exponentially small multiplier 
wipes out the factor of 2n. 

Thus we have successfully applied the tail-exchange trick to prove the 
estimate 

22n = L( 2 ^) = ~7^2 2n + 0(2 2n nTi +3e ) , ifO<e<l. (9.99) 

k ' ' 

We may choose e = i and conclude that 
cr = i In 27 t . 
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9 Prove (9.22) rigorously by showing that the left side is a subset of the 
right side, according to the set-of-functions definition of O. 

10 Prove or disprove: cos O(x) = 1 + 0(x 2 ) for all real x. 

11 Prove or disprove: 0(x + y) 2 = 0(x 2 ) + 0(y 2 ). 

12 Prove that 

1 + ^ + 0(h- 2 ) = (l + ^)0+ 0(n- 2 )) , 

as n — » 00. 

13 Evaluate (n + 2 + 0(n _1 )) n with relative error 0(n _1 ). 

14 Prove that (n + (x) n+|3 = n n+|3 e a (l + ct(|3 — ja)n _I + 0(n~ 2 )). 

15 Give an asymptotic formula for the “middle” trinomial coefficient ( n 3n ) , 
correct to relative error 0(n~ 3 ). 

16 Show that if B(1 — x) = — B(x) 0 for 0 < x < j, we have 

f b 

B({x})f(x)dx ^ 0 

J a 

if we assume also that f '(x) 0 for a ^ x ^ b. 

17 Use generating functions to show that B m (j) = (2 1 ~ m — l)B m , for all 
m ^ 0. 

18 Find ]T k ( 2 ™) with relative error 0(n~'/ 4 ), when a > 0. 

Homework exercises 

19 Use a computer to compare the left and right sides of the approximations 
in Table 452, when u = 10, z = a = 0.1, and 0(f(n)) = 0(f(z)) = 0. 

20 Prove or disprove the following estimates, as n — > 00: 

a 0 ((iSg^) / ) =°<^J 2 )- 
b e (i+o(i/n)) 2 = e + 0(1/n). 
c n! = o(((l - 1/n) n n) n ) . 

21 Equation (9.48) gives the nth prime with relative error 0(logn)~ 2 . Im- 
prove the relative error to 0(logu)~ 3 by starting with another term of 
(9.31) in (9.46). 

22 Improve (9.54) to 0(u~ 3 ). 

23 Push the approximation (9.62) further, getting absolute error 0(u~ 3 ). 
Hint: Let = c/(n + 1 ) (n + 2) + h n ; what recurrence does h n satisfy? 
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24 Suppose a n = 0(f(n)) and b n = 0(f(n)). Prove or disprove that the 
convolution ]T£ =0 a kb n -k is also 0(f(n)), in the following cases: 

a f(n) = n~ a , a > 1. 
b f(n) = oc~ n , a > 1. 

25 Prove (9.1) and (9.2), with which we opened this chapter. 

26 Equation (9.91) shows how to evaluate In 10! with an absolute error 
< 126000000 • Therefore if we take exponentials, we get 10! with a relative 
error that is less than e 1 / 126000000 - 1 < 10~ 8 . (In fact, the approxima- 
tion gives 3628799.9714.) If we now round to the nearest integer, knowing 
that 10! is an integer, we get an exact result. 

Is it always possible to calculate n! in a similar way, if enough terms of 
Stirling’s approximation are computed? Estimate the value of m that 
gives the best approximation to Inn!, when n is a fixed (large) integer. 
Compare the absolute error in this approximation with n! itself. 

27 Use Euler’s summation formula to find the asymptotic value of H n = 
)Tk=i k a , where a is any fixed real number. (Your answer may involve 
a constant that you do not know in closed form.) 

28 Exercise 5.13 defines the hyperfactorial function Q n = 1 1 2 2 . . . n n . Find 
the asymptotic value of Q n with relative error 0(n _1 ). (Your answer 
may involve a constant that you do not know in closed form.) 

29 Estimate the function 1 T 1 2 1 . . . n 1 / n as in the previous exercise. 

30 Find the asymptotic value of JL k>0 k l e~ k2 / n with absolute error 0(n~ 3 ), 
when l is a fixed nonnegative integer. 

31 Evaluate 2Ik>o V(c k + c m ) with absolute error 0(c~ 3m ), when c > 1 
and m is a positive integer. 

Exam problems 

32 Evaluate e H,l+Hrl with absolute error 0(n _1 ). 

33 Evaluate 2Ik>o (k)/ nk with absolute error 0(n~ 3 ). 

34 Determine values A through F such that (1 + 1/n) nHn is 

„ E(lnn) 2 Finn „ _i , 

An + B(lnn)" + Clnn + D H 1 FO(n ). 

n n 

35 Evaluate Y. k- 1 1/kHk with absolute error 0(1). 

36 Evaluate S n = l/(ti 2 + k 2 ) with absolute error 0(n~ 5 ). 

37 Evaluate ]^k=i (ti mod k) with absolute error O(nlogn). 

38 Evaluate Hk>o^ k (k) with relative error 0(n _1 ). 
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39 Evaluate Z!ock <n ln(n — k)(lnn.) k /1d with absolute error 0(u 1 ). Hint: 
Show that the terms for k ^ 10 Inn are negligible. 

40 Let m be a (fixed) positive integer. Evaluate (— 1) k H™ with abso- 

lute error 0(1). 

41 Evaluate the “Fibonacci factorial” Ok=i Fk with relative error 0(n _1 ) 
or better. Your answer may involve a constant whose value you do not 
know in closed form. 

42 Let a be a constant in the range 0 < a < We’ve seen in previous 
chapters that there is no general closed form for the sum JIk<an (£)• 
Show that there is, however, an asymptotic formula 



2TtH(a)-4 lg n+Oll ) 


where H(a) = alg ^ + (1 -a) lg(pL). Hint: Show that ( k n ,) < 
for 0 < k ^ an. 

43 Show that C n , the number of ways to change n cents (as considered in 
Chapter 7) is asymptotically cn 4 + 0(u 3 ) for some constant c. What is 
that constant? 


44 Prove that 


= x 1/2 


1 / 2 ' 

1/2 


— x 


1/2 


1/2 

- 1/2 


- 3/2 


' 1 / 2 ' 
-3/2 


+ 0(x 


- 5 / 2 , 


as x -> oo. (Recall the definition x/Z = x!/(x — j)l in (5.88), and the 
definition of generalized Stirling numbers in Table 272.) 

45 Let a be an irrational number between 0 and 1 . Chapter 3 discusses the 
quantity D(cx,n), which measures the maximum discrepancy by which 
the fractional parts {kcx} for 0 S( k < n deviate from a uniform distribu- 
tion. The recurrence 


D(a,n) ^ D({oc~ 1 }, LcxtlJ ) + a -1 + 2 
was proved in (3.31); we also have the obvious bounds 
0 ^ D(a, n) ^ n. 


Prove that limn^oo D(a, n)/n = 0. Hint: Chapter 6 discusses continued 
fractions. 
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46 Show that the Bell number u) n = e 1 ^ k>0 k n /k! of exercise 7.15 is 
asymptotically equal to 

m(u) n e m(n) ~ n ~ 1/2 /v / hru , 

where m(n)lnm(n) = n — j, and estimate the relative error in this 
approximation. 

47 Let m be an integer 2. Analyze the two sums 

n n 

X^L lo gmk| and ^flog m k]; 
k=1 k=1 

which is asymptotically closer to log m n! ? 

48 Consider a table of the harmonic numbers Hk for 1 ^ k ^ n in decimal 
notation. The kth entry Hk has been correctly rounded to die significant 
digits, where dk is just large enough to distinguish this value from the 
values of Hk-i and Hk+i . For example, here is an extract from the table, 
showing five entries where Hk passes 1 0: 


k 

H k 

H k 

d k 

12364 

9.99980041- 

9.9998 

5 

12365 

9.99988128+ 

9.9999 

5 

12366 

9.99996215- 

9.99996 

6 

12367 

10.00004301- 

10.0000 

6 

12368 

10.00012386+ 

10.0001 

6 


Estimate the total number of digits in the table, £j k=1 dk, with an ab- 
solute error of O(n). 

49 In Chapter 6 we considered the tale of a worm that reaches the end of a 
stretching band after n seconds, where H n _i < 100 ^ H n . Prove that if 
n is a positive integer such that H n _i ^ a ^ H n , then 

Le“~ y J <; n <; . 

50 Venture capitalists in Silicon Valley are being offered a deal giving them 
a chance for an exponential payoff on their investments: For an n mil- 
lion dollar investment, where n ^ 2, the GKP consortium promises to 
pay up to N million dollars after one year, where N = 10 n . Of course 
there’s some risk; the actual deal is that GKP pays k million dollars with 
probability 1 /(k 2 H^ ), for each integer k in the range 1 ^ k ^ N. (All 
payments are in megabucks, that is, in exact multiples of $1,000,000; the 
payoff is determined by a truly random process.) Notice that an investor 
always gets at least a million dollars back. 
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a What is the asymptotic expected return after one year, if n million 
dollars are invested? (In other words, what is the mean value of the 
payment?) Your answer should be correct within an absolute error 
of 0 ( 1 CT n ) dollars. 

b What is the asymptotic probability that you make a profit, if you 
invest n million? (In other words, what is the chance that you get 
back more than you put in?) Your answer here should be correct 
within an absolute error of 0 (n~ 3 ). 

Bonus problems 

51 Prove or disprove: 0(x~ 2 ) dx = 0(n _1 ) as n -> oo. 

52 Show that there exists a power series A(z) = ]T k>0 a n.z n , convergent for 
all complex z, such that 


n 

A(n) >- nd 1 "' 



53 Prove that if f (x) is a function whose derivatives satisfy 


f'(xK0, — f"(xK0, f'"(xK0, ..., (_i) m f (m+1) (x) < o 

for all x ^ 0, then we have 

f'fOl f(nv-U( 0 ) . 

f(x) = f (0) + -4-ix + ■ • • + -x m_1 + 0(x m ) , for X ^ 0. 

1! (m— 1)! 

In particular, the case f(x) = — ln(l + x) proves ( 9 . 64 ) for all k, n > 0. 

54 Let f(x) be a positive, differentiable function such that xf'(x) -< f(x) as 
x — > 00 . Prove that 


y 

Z_ i<i+« 

k^n 



if a > 0 . 


Hint: Consider the quantity f(k — j)/(k — j) a — f (k + j)/{k + j) 01 . 

55 Improve ( 9 . 99 ) to relative error 0(n~ 3 / 2+5e ). 

56 The quantity Q(n) = 1 + H = ^ k>1 n -/ nk occurs in 

the analysis of many algorithms. Find its asymptotic value, with absolute 
error o ( 1 ) . 

57 An asymptotic formula for Golomb’s sum X!k>i 1 + log n k | 2 * s de- 

rived in ( 9 . 54 ). Find an asymptotic formula for the analogous sum with- 
out floor brackets, X!k>i V^O + log n k) 2 . Hint: Consider the integral 

ue- u k- tu du = 1/(1 + tin k) 2 . 


I once earned 
0(1 0~ n ) dollars. 
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58 Prove that 


Bm(W) 


= -2 


m! 

(27t) m 


L 


cos(27tkx — jTim] 
k™ 


for m^2, 


by using residue calculus, integrating 


1 

27ti 


; 2nie 2nize dz 

° e 2niz _ 1 z m 


on the square contour z = x+iy, where max(|x|, |y|) = M +• 1 , and letting 
the integer M tend to oo. 

59 Let 0 n (t) = e _,k+t * 2 / n , a periodic function of t. Show that the 

expansion of 0 n (t) as a Fourier series is 

0 n (t) = v 7 7trt(l + 2e~ n n (cos27tt) + le~ 4n n (cos47tt) 

+ 2e~ 97t n (cos 67tt) + •••). 


(This formula gives a rapidly convergent series for the sum 0 n = 0 n (O) 
in equation (9.93).) 

60 Explain why the coefficients in the asymptotic expansion 

11 5 21 _ 5 
8n 128u 2 1024n 3 32768n 4 1 

all have denominators that are powers of 2. 

61 Exercise 45 proves that the discrepancy D(a, n) is o(n) for all irrational 
numbers a. Exhibit an irrational a such that D(cx,n) is not 0(n 1_e ) 
for any e > 0. 

62 Given n, let { ( } = max k { £ } be the largest entry in row n of Stirling’s 

subset triangle. Show that for all sufficiently large n, we have m(n) = 
Lm(n)J or m(n) = |~ra(n)], where 



m(n)(m(n) + 2) ln(m(n) + 2) = u(m(n) + 1). 


Hint: This is difficult. 

63 Prove that S. W. Golomb’s self-describing sequence of exercise 2.36 sat- 
isfies f(n) = + Ofn* -1 /log n). 

64 Find a proof of the identity 

L cos 2n7tx t ^ i . „ 

= tt 2 (x 2 - x + 1) for 0 < x ^ 1 , 

n^l n 

that uses only “Eulerian” (eighteenth-century) mathematics. 
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65 What are the coefficients of the asymptotic series 


1 


n — 1 


1 

(n-1)(n-2) + 


1 

(ri— Tjl 



n 



? 


Research problems 

66 Find a “combinatorial” proof of Stirling’s approximation. (Note that n n 
is the number of mappings of {1 , 2, . . . , n} into itself, and n! is the number 
of mappings of (1 , 2, . . . , n} onto itself.) 

67 Consider an n x n array of dots, n Js 3, in which each dot has four 
neighbors. (At the edges we “wrap around” modulo n.) Let Xn be the 
number of ways to assign the colors red, white, and blue to these dots in 
such a way that no neighboring dots have the same color. (Thus X 3 = 12.) 
Prove that 

Xn ~ (l) 3n2/2 e-^. 

68 Let Q n be the least integer m such that H m > n. Find the smallest 
integer n such that Q n ^ [ en ~ Y + tJ i or prove that no such n exist. 


Th-th-th-t hat’s all, 
folks! 



A 


Answers to Exercises 


(The first finder of 
every error in this 
book will receive 
a reward of $2.56.) 

Does that mean 
I have to find every 
error? 

(We meant to say 
“any error.”) 

Does that mean 
only one person gets 
a reward? 

(Hmmm. Try it and 
see.) 


EVERY EXERCISE is answered here (at least briefly), and some of these 
answers go beyond what was asked. Readers will learn best if they make a 
serious attempt to find their own answers before peeking at this appendix. 

The authors will be interested to learn of any solutions (or partial 
solutions) to the research problems, or of any simpler (or more correct) ways 
to solve the non-research ones. 

1.1 The proof is fine except when n = 2. If all sets of two horses have 
horses of the same color, the statement is true for any number of horses. 

1.2 If X n is the number of moves, we have Xo = 0 and X n = X n _i + 1 + 
X n _i + 1 + X n _i when n > 0. It follows (for example by adding 1 to both 
sides) that X n = 3 n — 1 . (After lX n moves, it turns out that the entire tower 
will be on the middle peg, halfway home!) 

1.3 There are 3 n possible arrangements, since each disk can be on any of 
the pegs. We must hit them all, since the shortest solution takes 3 n — 1 moves. 
(This construction is equivalent to a “ternary Gray code,” which runs through 
all numbers from (0 . . . 0)3 to (2 . . . 2)3, changing only one digit at a time.) 

1.4 No. If the largest disk doesn’t have to move, 2 n_1 — 1 moves will suffice 
(by induction); otherwise (2 n ~ 1 — 1) + 1 + (2 n ~ 1 — 1) will suffice (again by 
induction) . 


The number of 
intersection points 
turns out to give 
the whole story ; 
convexity was a red 
herring. 


1.5 No; different circles can intersect in at most two points, so the fourth 
circle can increase the number of regions to at most 14. However, it is possible 
to do the job with ovals: 
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Venn [359] claimed that there is no way to do the five-set case with ellipses, 
but a five-set construction with ellipses was found by Griinbaum [167]. 

1.6 If the nth line intersects the previous lines in k > 0 distinct points, we 
get k— 1 new bounded regions (assuming that none of the previous lines were 
mutually parallel) and two new infinite regions. Hence the maximum number 
of bounded regions is (n— 2) + (n— 3) + ■ ■ • = S n _2 = (n— 1)(n— 2)/2 = L n — 2n. 

1.7 The basis is unproved; and in fact, H(1 ) yb 2. 

1.8 Q 2 = (1 + (3 )/a; Q 3 = (1 + a+ |3)/a|3; Q 4 = (1 + a)/ (3; Q 5 = a; 

Qg = (3. So the sequence is periodic! 

1.9 (a) We get P(n — 1 ) from the inequality 


xi . . .x n _i 


*1 H bx n _i 

n — 1 


< 


xi -t h X n _] 

n- 1 


(b) X! ...x n x n +i ...X 2n ^ (((xi H h x n )/n) ((x n+1 H hx 2n )/n)) n by 

P(n); the product inside is ^ ((xi +• • --|-X2n)/2n) 2 by P(2). (c) For example, 
P(5) follows from P(6) from P(3) from P (4) from P(2). 

1.10 First show that R n = R n _i + 1 + Q n -i + 1 + R n -i, when n > 0. 
Incidentally, the methods of Chapter 7 will tell us that Q n = ((1 + y3 ) n+1 — 
(1 -v / 3) n+1 )/(2 V / 3) -1. 

1.11 (a) We cannot do better than to move a double (n — l)-tower, then 
move (and invert the order of) the two largest disks, then move the double 
(n — 1 [-tower again; hence = 2A n _i + 2 and A n = 2T n = 2 n+1 — 2. This 
solution interchanges the two largest disks but returns the other 2 n — 2 to 
their original order. 

(b) Let B n be the minimum number of moves. Then B i =3, and it can 
be shown that no strategy does better than B n = A n _i +2 + A n _i + 2 + B n _i 
when n > 1 . Hence B n = 2 n+2 — 5, for all n > 0. Curiously this is just 2A n — 1 , 
and we also have B n = A n _i + 1 + A n _i + 1 + A n _j + 1 + A n _i . 

1.12 If all mic > 0, then A(mi , . . . , m n ) = 2A(mi , . . . , m n _i ) + m n . This is 

an equation of the “generalized Josephus” type, with solution (mi ... m n )2 = 
2 n_1 mi -I- b 2m„_i + m n . 

Incidentally, the corresponding generalization of exercise lib appears 
to satisfy the recurrence 


B(mi,...,m n ) 


7 A(mi , . . . ,m n ), if m n = 1; 

) 2m n — 1 , if n = 1 ; 

2A(mi , . . . , m n _i ) +2m n 
, + B(mi , . . . , m n _i ), if n > 1 and m n > 1 . 


This answer as- 
sumes that n > 0. 
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1.13 Given n straight lines that define L n regions, we can replace them 
by extremely narrow zig-zags with segments sufficiently long that there are 
nine intersections between each pair of zig-zags. This shows that ZZ n = 
ZZ n _! +9n— 8 , for alln > 0; consequently ZZ n = 9S n — 8n+1 = |n 2 — jU+1. 

1.14 The number of new 3-dimensional regions defined by each new cut is 
the number of 2 -dimensional regions defined in the new plane by its intersec- 
tions with the previous planes. Hence P n = P n _i + L n _i, and it turns out 
that P 5 = 26. (Six cuts in a cubical piece of cheese can make 27 cubelets, or 
up to Pg = 42 cuts of weirder shapes.) 

Incidentally, the solution to this recurrence fits into a nice pattern if 
we express it in terms of binomial coefficients (see Chapter 5): 


I bet I know what 
happens in four 
dimensions! 



Here X n is the maximum number of 1-dimensional regions definable by n 
points on a line. 

1.15 The function I satisfies the same recurrence as J when n > 1, but 1(1) 
is undefined. Since 1(2) = 2 and 1(3) = 1, there’s no value of 1(1) = a that 
will allow us to use our general method; the “end game” of unfolding depends 
on the two leading bits in n’s binary representation. 

If n = 2 m + 2 m_1 + k, where 0 ^ k < 2 m+1 + 2 m - (2 m + 2 m ^ 1 ) = 
2 m +2 m_1 , the solution is I(n) = 2k+ 1 for all n > 2. Another way to express 
this, in terms of the representation n = 2 m + l, is to say that 


n(n)+ 2 m -\ if 0 ^l< 2 m - 1 ; 

\j(n)- 2 m , if 2 ^ 1 sC l < 2 m . 


1.16 Let g(n) = a(n)a + b(n)|3o + c(n)|3i + d(n)y. We know from ( 1 . 18 ) 
that a(n)a + b(n)|3 0 + c(n)|3i = (<x|3 bm _, (3 bm _ 2 ••• (3b, | 3 b 0 )3 when n = 
(1 b m _i . . . bi boh; this defines a(n), b(n), and c(n). Setting g(n) = n in the 
recurrence implies that a(n) + c(n) — d(n) = n; hence we know everything. 
[Setting g(n) = 1 gives the additional identity a(n) — 2b(n) — 2c(n) = 1, 
which can be used to define b(n) in terms of the simpler functions a(n) and 
a(n) + c(n).] 

1.17 In general we have W m h 2W m _k + T^, for 0 ^ k ^ m. (This relation 
corresponds to transferring the top m — k, then using only three pegs to 
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move the bottom k, then finishing with the top m — k.) The stated relation 
turns out to be based on the unique value of k that minimizes the right- 
hand side of this general inequality, when m = n(n + 1)/2. (However, we 
cannot conclude that equality holds; many other strategies for transferring 
the tower are conceivable.) If we set Y n = (W n(n+1 ) n — 1 )/2 n , we find that 
Yn ^ Y n-i + 1 ; hence W n(n+1)/2 ^ 2 n (n - 1 ) + 1 . 

1.18 It suffices to show that both of the lines from (n 2 i , 0) intersect both of 
the lines from (n 2 k , 0 ), and that all these intersection points are distinct. 

A line from (xj,0) through (xj — a.j,1) intersects a line from (xk,0) 
through (xk — cik, 1 ) at the point (xj — tapt) where t = (xk — Xj)/(ak — cq). 
Let Xj = n 2i and cq = n? + (0 or n~ n ). Then the ratio t = (n 2k — n 2 ’)/ 
(n k — rd + (— ■ n~ n or 0 or n~ n )) lies strictly between n? + n k — 1 and n? + 
n k + 1 ; hence the y coordinate of the intersection point uniquely identifies ) 
and k. Also the four intersections that have the same ) and k are distinct. 

1.19 Not when tl > 11. A bent line whose half-lines run at angles 0 and 
0 + 30° from its apex can intersect four times with another whose half-lines 
run at angles 4) and 4> + 30° only if |0 — c|)| > 30°. We can’t choose more 
than 11 angles this far apart from each other. (Is it possible to choose 11?) 

1.20 Let h(n) = a(n)a + b(n)(3 0 + c(n)|3] + d(n)Yo + e(n)Yj . We know 
from ( 1 . 18 ) that a(n)<x + b(n)|3 0 + c(n)(3j = (a|3 bm _, |3 bm _ 2 . . . |3 b , (3 bo ) 4 
when n = (1 b m _i ...bj bo) 2 ; this defines a(n), b(n), and c(n). Setting 
h(n) = n in the recurrence implies that a(n) + c(n) — 2 d(n) — 2 e(n) = n; 
setting h(n) = n 2 implies that a(n) + c(n) + 4e(n) = n 2 . Hence d(n) = 
(3a(n) + 3c(n) — n 2 — 2n)/4; e(n) = (n 2 — a(n) — c(n))/4. 

1.21 We can let m be the least (or any) common multiple of 2n, 2n — 1, 
. . . , n + 1 . [A non-rigorous argument suggests that a “random” value of m 
will succeed with probability 

n n — 1 1 / / 2 n\ s/nxi 

2 n 2n — 1 ’ " n + 1 / \n J 4"“’ 

so we might expect to find such an m less than 4 n .] 

1.22 Take a regular polygon with 2 n sides and label the sides with the 
elements of a “de Bruijn cycle” of length 2 n . (This is a cyclic sequence of 
0’s and l’s in which all n-tuples of adjacent elements are different; see [207, 
exercise 2.3.4.2-23] and [208, exercise 3.2.2-17].) Attach a very thin convex 
extension to each side that’s labeled 1 . The n sets are copies of the resulting 
polygon, rotated by the length of k sides for k = 0 , 1 , ..., n — 1 . 

1.23 Yes. (We need principles of elementary number theory from Chap- 
ter 4.) Let L(n) = lcm(l ,2, . . . ,n). We can assume that n > 2; hence by 
Bertrand’s postulate there is a prime p between n/2 and n. We can also 


I once rode a 
de Bruijn cycle 
(when visiting at 
his home in Nuenen, 
The Netherlands). 
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assume that j > n/2, since q' = L(n) + 1 — q leaves )' — n + 1 — j if and 
only if q leaves j. Choose q so that q = 1 (mod L(n)/p) and q = j + 1 — ri 
(mod p). The people are now removed in order 1,2, , . . , n — p, j + 1 , j + 2, 
...,n, n — p + 1, — 1. 

1.24 The only known examples are: X n = 1 /X n _i, which has period 2; 
Gauss’s recurrence of period 5 in exercise 8 ; H. Todd’s even more remarkable 
recurrence X n = (1 +X n _i + X n _2)/X n _3, which has period 8 (see [261]); and 
recurrences derived from these when we replace X n by a constant times X mn . 
We can assume that the first nonzero coefficient in the denominator is unity, 
and that the first nonzero coefficient in the numerator (if any) has nonnegative 
real part. Computer algebra shows easily that there are no further solutions of 
period si 5 when k = 2. A partial theory has been developed by Lyness [261, 
262] and by Kurshan and Gopinath [231]. 

An interesting example of another type, with period 9 when the start- 
ing values are real, is the recurrence X n = |X n _i |— X n _2 discovered by Morton 
Brown [43]. Nonlinear recurrences having any desired period 5; 5 can be based 
on continuants [65]. 

1.25 If T^ k * (n) denotes the minimum number of moves needed to transfer n 
disks with k auxiliary pegs (hence T * 1 :i (n) = T n and T* 2 * (n) = W n ), we have 
T( k ’(( n + 1 )) ^ 2T (k >((£)) + T |lc ~ 1) (( k n 1 ))- No examples (n,k) are known 
where this inequality fails to be an equality. When k is small compared with 
n, the formula 2 n+1 ~ k (]]~ 1 1 ) gives a convenient (but non-optimum) upper 
bound on T ,k) ((£)). 

1.26 The execution-order permutation can be computed in O(nlogn) steps 
for all m and n [209, exercises 5. 1.1-2 and 5. 1.1-5]. Bjorn Poonen has proved 
that non- Josephus sets with exactly four “bad guys” exist whenever n = 0 
(mod 3) and n 9; in fact, the number of such sets is at least e(^) for some 
e > 0. He also found by extensive computations that the only other n < 24 
with non- Josephus sets is n = 20, which has 236 such sets with k = 14 and 
two with k = 13. (One of the latter is {1 ,2, 3,4, 5, 6, 7, 8, 1 1 , 14, 15, 1 6, 17}; the 
other is its reflection with respect to 21.) There is a unique non- Josephus set 
with n = 15 and k = 9, namely {3,4, 5, 6, 8, 1 0, 1 1 , 12, 13}. 

2.1 There’s no agreement about this; three answers are defensible: (1) We 
can say that 4 k always equivalent to 2 I m<k<n 4 k! then the stated 

sum is zero. (2) A person might say that the given sum is q 4 + q 3 + q 2 + 
q i + q o , by summing over decreasing values of k. But this conflicts with the 
generally accepted convention that XLk=i 4k = 0 when n = 0. (3) We can 
say that Y.k= m qk = Lk$n 4 k - Lk<m 4 k; then the stated sum is equal to 
— 41 — q 2 — 43 . This convention may appear strange, but it obeys the useful 

law Lk=a + lLb + 1 = Lk=a for a11 Q - b > C - 
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It’s best to use the notation ]T x=m on ly when n — m ^ — 1; then both 
conventions (1) and (3) agree. 

2.2 This is |x|. Incidentally, the quantity ([x>0] — [x<0]) is often called 
sign(x) or signum(x); it is +1 when x > 0 , 0 when x = 0 , and —1 when x < 0 . 

2.3 The first sum is, of course, ao + ai + ci 2 + a .3 + (14 + as ; the second is <14 + 
04 + do + ai + 04 , because the sum is over the values k £ {— 2 , — 1 , 0 , + 1 , + 2 }. 
The commutative law doesn’t hold here because the function p(k) = k 2 is not 
a permutation. Some values of n (e.g., n = 3) have no k such that p(k) = n; 
others (e.g., n = 4) have two such k. 

2.4 (a) ^ l= i ^j =i+ i ^k=j+i a b k = ^i=i ^j=i+i ^k=j+i a b k = 
( ( cn 23 + CL 124 ) + oil 34 ) + 01234- 

0 3 ) 2_k=1 2_j = 1 ^i=1 a i-jk — Z_k=3 L-)=2 2_i=1 a hjk — a 123 + ( a 124 + 

(0134 + 01234 ))- 

2.5 The same index ‘k’ is being used for two different index variables, al- 
though k is bound in the inner sum. This is a famous mistake in mathematics 
(and computer programming). The result turns out to be correct if Oj = Ok 
for all ) and k, 1 ^j,k^n. 

2.6 It’s [1 ^ j ^ n] (n — j + 1 ). The first factor is necessary here because we 
should get zero when j < 1 or j > n. 

2.7 mx m_1 . A version of finite calculus based on V instead of A would 
therefore give special prominence to rising factorial powers. 

2.8 0, if m 2s 1 ; 1 /|m| !, if m ^ 0. 

2.9 x m+n _ x m ( x _|_ m ) n ) f or integers m and n. Setting m = — n tells us 
that x~ n = 1 /(x — n) n = 1 /(x — 1 )— . 

2.10 Another possible right-hand side is EuAv + vAu. 

2.11 Break the left-hand side into two sums, and change k to k + 1 in the 
second of these. 

2.12 If p(k) = n then n + c = k + ((—1 ) k + 1)c and ((—I ) k + 1) is even; 
hence (— l) n+c = (— l) k and k = n — (— I) n+C c. Conversely, this value of k 
yields p(k) =n. 

2.13 Let Rq = a, and R tt = R n _i + (— 1) n (|3 +ny + n 2 6) for n > 0. Then 
R(n) = A(u)a + B(n)|3 + C(n)y + D(n)6. Setting R n = 1 yields A(n) = 1. 
Setting R n = (— l) n yields A(n) + 2B(n) = (— l) n . Setting R n = (— l) n n 
yields — B(n)+2C(n) = (— l) n n. Setting R n = (— 1) n n 2 yields B(n)— 2C(n) + 
2D(n) = (— l) n n 2 . Therefore 2D (n) = (— 1 ) n (u 2 -|-n); the stated sum is D(n). 
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2.14 The suggested rewrite is legitimate since we have k = JLi<j<k 1 w hen 
1 ^k^n. Sum first on k; the multiple sum reduces to 

Y (2 n+1 -2 j ) = n2 n+1 - (2 n+1 - 2) . 

1 ^j^Tl 

2.15 The first step replaces k(k + 1) by 2^ 1<j<1< .j. The second step gives 

®n + Dn = (Lk=l ^) 2 + Dn. 

2.16 x^(x - m)— = x^ = x— (x - n)^, by (2.52). 

2.17 Use induction for the first two =’s, and (2.52) for the third. The second 
line follows from the first. 


2.18 Use the facts that (91z) + ^ |z|, (91z) <) |z|, (3z) + ^ |z|, (3z) ^ \z\, 

and |z| ^ (91z) + + (91z)~ + (3z) + + (3z)~. 

2.19 Multiply both sides by 2 n ~Vn! and let = 2 n T n /u!= S n _i + 3 • 
2 n ~ 1 = 3(2 n - 1 ) + S 0 . The solution is T n = 3 • n! + n!/ 2 n ~ 1 . (We’ll see in 
Chapter 4 that T n is an integer only when n is 0 or a power of 2.) 


“It is a profoundly 
erroneous truism, 
repeated by all 
copybooks and by 
eminent people 
when they are 
making speeches, 
that we should 
cultivate the habit 
of thinking of what 
we are doing. The 
precise opposite is 
the case. Civiliza- 
tion advances by 
extending the num- 
ber of important 
operations which 
we can perform 
without thinking 
about them. Opera- 
tions of thought are 
like cavalry charges 
in a battle — they 
are strictly limited 
in number, they 
require fresh horses, 
and must only be 
made at decisive 
moments.” 

—A.N. White- 
head [370] 


2.20 The perturbation method gives 


Sn + (t + 1 )H n+ i 



+ n+ 1 . 


2.21 Extracting the final term of Sn+i gives Sn + i = 1 — S n ; extracting the 
first term gives 


S n+1 = (-D n+1 + Y- (- 1 ) n+ ’- k 

l^ksCn+1 


(-i) n+1 + Y M) n ~ k 
(-D n+1 +s n . 


Hence 2S n = 1 + (— 1) n and we have S n = [n is even]. Similarly, we find 

n 

Tn+i = n+1 -T n = ^J-1) n - k (k+1) = T n + S n , 

k=0 

hence 2T n = n + 1 — S n and we have T n = 2( n + I n is odd]). Finally, the 
same approach yields 

Un+i = (n+1)" — U n = U n +2T n + S n 

= U n + n + [n is odd] + [n is even] 

= U n + n + 1 . 

Hence U n is the triangular number j (n + 1 )n. 
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2.22 Twice the general sum gives a “vanilla” sum over 1 j,k ^ n, which 
splits and yields twice (£ k a k A k ) (£ k b k B k ) - (jT k a k B k ) ()T k b k A k ). 

2.23 (a) This approach gives four sums that evaluate to 2n + H n — 2n + 
+ n+T — (It would have been easier to replace the summand by 

1/k + l/(k + 1 ).) (b) Let u(x) = 2x + 1 and Av(x) = 1/x(x + 1 ) = (x — 1 )— ; 
then Au(x) = 2 and v(x) = — (x — 1 )— s= —1 /x. The answer is 2H n — 

2.24 Summing by parts, £ x23-H x 6x = x2i±lH x /(m + 1 ) - x 2i±l/(m + 1 ) 2 + 
C; hence L 0$k<T1 k ~H k = nH±i(H n - 1/(m+ 1 ))/(m+ 1 ) +02I±l/( m + 1 ) 2 . 

In our case m = —2, so the sum comes to 1 — (H n + 1 )/(n + 1 ). 

2.25 Here are some of the basic analogies: 


Y_ ca k = C y 


^(a k +b k ) = 


kGK kGK 


I“ k = L 


p( k )eK 


X. a i.k = X X a ).i< 


jel kGK 


X a k = X ak l keK l 


n a ^k = ( n a 0 ( n b 


p(k)GK 


nn 


je J keK 


EK = n< 


L 1 = # k «— » n c = c#K 

kGK kGK 

2.26 P 2 = (ni^j, k <cn a i a k)(ni^j= k <cn a i Q k)- The first factor is equal to 
(]~l k =i a k ) 2 ; tbe second factor is ]~[ k =i Q k - Hence P = (n k =i a k) R+1 - 

2.27 A(c-) = c-(c — x — 1 ) = c^±V(c — *)■ Setting c = — 2 and decreasing 
x by 2 yields A(— (— 2 )— -) — (— 2)-/x, hence the stated sum is (—2)— — 

(-2)2=1= (-1) n n! - 1. 

2.28 The interchange of summation between the second and third lines is 
not justifiable; the terms of this sum do not converge absolutely. Everything 

else is perfectly correct, except that the result of )T k>1 [k = j — 1]k/j should As opposed to 
perhaps have been written [j — 1 ^ 1](j — 1)/j and simplified explicitly. imperfectly correct. 


2.29 Use partial fractions to get 


1/1 1 \ 

4 \2k + 1 + 2k- 1 J ' 


k 

4k 2 - 1 
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The (— 1) k factor now makes the two halves of each term cancel with their 
neighbors. Hence the answer is — 1/4 + (— 1 ) n /(8n + 4). 

2.30 = \ (b- — a-) = j(b — a)(b + a — 1 ). So we have 

(b-a)(b + a-1) = 2100 = 2 2 -3-5 2 -7. 

There is one solution for each way to write 2100 = x-y where x is even and 
y is odd; we let a = \ |x — y| + \ and b = j(x + y) + So the number of 
solutions is the number of divisors of 3 • 5 2 • 7, namely 1 2. In general, there 
are Op> 2 ( n p + 1 ) ways to represent ]/[ p np , where the products range over 
primes. 

2.31 Lj,ic>2i _lc = Lj ^2 Vi 2 ! 1 - Vi) = - T)- The second sum 

is, similarly, 3/4. 

2.32 If 2n ^ x < 2n+l, the sums are 0 + - • • +ri+ (x— n—l ) + • ■ - + (x— 2n) = 
n(x— n) = (x— 1) + (x— 3) + ■ • • + (x— 2n+1). If 2n — 1 ^ x < 2n they are, 
similarly, both equal to n(x — n). (Looking ahead to Chapter 3, the formula 

(x + 1 )J (x — (x + 1 )J ) covers both cases.) 

2.33 If K is empty, AkeK a k = °°- The basic analogies are: 


Y CQk = c^Q k < — > f\ (c + a k ) = c + f\ a k 

keK keK keK keK 


X (a k+b k ) - ^a k + ^b k 4 — > /\ min(a k) b k ) 


keK 

keK k<EK 

keK 




= mini 

( A a k . A bk 




V keK keK 

Y q k = 

Y Q p(fc) < 

i 

> 

p 

* 

11 

A a p< k ) 

keK 

p( k )eK 

keK 

p(k)eK 

Y a b k 

= YY < 

» A a b k : 

= A A a b k 

jej 

jeT keK 

iel 

jej keK 

keK 


keK 


Y a k = 

Y_ a k [kG K] < 

A ak = 

A^k-oo^l 

keK 

k 

keK 

k 


A permutation that 
consumes terms of 
one sign faster than 
those of the other 
can steer the sum 
toward any value 
that it likes. 


2.34 Let K + = {k | a k /? 0} and K = {k | a k < 0}. Then if, for example, 
n is odd, we choose F n to be F n _i U E n , where F n C K~ is sufficiently large 
that ^ ke (F n _ in K+) Q k — LkeEnVQk) < A . 

2.35 Goldbach’s sum can be shown to equal 


Y m 

m,n/>2 


Y 

m ^>2 


1 

m(m — 1 ) 


1 
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as follows: By unsumming a geometric series, it equals )T keP i>i ! there- 
fore the proof will be complete if we can find a one-to-one correspondence 
between ordered pairs (m, n) with m,n ^ 2 and ordered pairs (k,l) with 
kg P and l :> 1 , where m n = k l when the pairs correspond. If m ^ P we let 
(m,n) < — > (m n , 1); but if m = a b € P, we let (m,n) < — > (a n ,b). 

2.36 (a) By definition, g(n) - g(n- 1) = f(n). (b) By part (a), g(g(n)) - 

g(g(n- 1 )) = )T k f ( k ) [sf n “ 1 ) < 9 (n.)] = n(g(n) - g(n- 1 )) = nf(n). 

(c) By part (a) again, g(g(g(n))) - g(g(g(n - 1 ))) is 

X f (k) [g(g(n-l))<k<g(g(n))] 

k 

= 2^3 [i =f(lc)] [ g (g(rL- -I)) <1C<: g(gfTx))] 

i.k 

= Xj[j=f(k)][g(n-l)<j^g(n)] 

i.k 

= i (g(j) — g(j — i )) [9 (tt — 1 ) < j ^ g( n )] 

\ 

= [g( n — 1 ) < j ^ g( n )] = n [g(n-1)<Kg(n)] . 

j i 

Colin Mallows observes that the sequence can also be defined by the recurrence 

f (1 ) = 1; f(n + 1) = 1 + f(n + 1 — f(f(n))) , for n ^ 0. 

2.37 (RLG thinks they probably won’t fit; DEK thinks they probably will; 
OP is not committing himself.) 

3.1 m = |_lg nj ; l = n - 2 m = n - 2 L‘s n J . 

3.2 (a) [x+.5J. (b) |x — .5]. 

3.3 This is [inn — {ma}n/<xj = mn — 1 , since 0 < {met} < 1 . 

3.4 Something where no proof is required, only a lucky guess (I guess). 

3.5 We have [tlx.] = |_ n L x J + TV ( X }J = n L x J + [n{x}j hy ( 3 . 8 ) and ( 3 . 6 ). 
Therefore [nxj = n[xj <£=£> (u{x}J = 0 <f=^> 0 S! n{x} < 1 <£==^ {x} < 1/n, 
assuming that n is a positive integer. (Notice that n[x.J Si [ nx J f° r all x i n 
this case.) 

3.6 Lf ( X )J = Lf(f x l)J. 

3.7 |n/m.J + n mod m. 

3.8 If all boxes contain < pti/m] objects, then n Si ([n/m] — l)m, so 
n/m+ 1 si |~n/m], contradicting ( 3 . 5 ). The other proof is similar. 


With this self- 
description, 
Goiomb’s se- 
quence wouldn’t 
do too well on the 
Dating Game. 
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3.9 We have m/n— 1 /q = (n mumble m) / qn. The process must terminate, 
because 0 ^ n mumble m < m. The denominators of the representation are 
strictly increasing, hence distinct, because qn/(n mumble m) > q. 

3.10 [x + j~\ — [(2x + 1 )/4 is not an integer] is the nearest integer to x, if 
{x} j, otherwise it’s the nearest even integer. (See exercise 2.) Thus the 
formula gives an “unbiased” way to round. 

3.11 If n is an integer, a < n < |3 <t=^> [a] < < [(3]. The number of 

integers satisfying a < n < b when a and b are integers is (b — a — 1 )[b > a]. 
We would therefore get the wrong answer if a = (3 = integer. 

3.12 Subtract [n/mj from both sides, by (3.6), getting |~(nmod m)/m] = 
[(n mod m + m — 1 )/mJ . Both sides are now equal to [n mod m > 0] , since 
0 ^ n mod m < m. 

A shorter but less direct proof simply observes that the first term in 
(3.24) must equal the last term in (3.25). 

3.13 If they form a partition, the text’s formula for N(a,n) implies that 
1/a + 1/(3 = 1, because the coefficients of n in the equation N(a, n) + 
N(|3,n) = n must agree if the equation is to hold for large n. Hence a 
and |3 are both rational or both irrational. If both are irrational, we do get 
a partition, as shown in the text. If both can be written with numerator m, 
the value m— 1 occurs in neither spectrum, and m occurs in both. (However, 
Golomb [151] has observed that the sets {[na] | u 1} and {[n|3] — 1 | n /i 1} 
always do form a partition, when 1 /a + 1/(3 = 1 .) 

3.14 It’s obvious by (3.22) if ny = 0, otherwise true by (3.21) and (3.6). 

3.15 Plug in [rax] for n in (3.24): [mx] = [x] + [x— H h [x— . 

3.16 The formula n mod 3 = 1 + j ((o> — 1 )a> n — (a> + 2)a> 2n ) can be verified 
by checking it when 0 Sj n < 3. 

A general formula for n mod m, when m is any positive integer, ap- 
pears in exercise 7.25. 

3.17 Hj, k [0 ^k< m][l ^ j ^x + k/m] = [Tj k [0^k<m][l ^ j ^ |Y|] x 

[k^m(j-x)] = Li<cj<crxi Lk[°^ k<m ] ^L j= rxi Lk[°^ k < m (i -*)] = 
m[x] — [m([x] — x)] = — [—mx] = [mxj. 

3.18 We have 

S = y~ [ja -1 ^k< (j + vjtxr 1 ] . 

0^j< [ncc] k^n 

If j ^ na — 1 ^ na — v, there is no contribution, because (j + v)a _1 sj n. 
Hence j = [naj is the only case that matters, and the value in that case 
equals |~(LnaJ +v)a _1 ] — n^ [va -1 ]. 
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3.19 If and only if b is an integer. (If b is an integer, log b x is a continuous, 
increasing function that takes integer values only at integer points. If b is not 
an integer, the condition fails when x = b.) 

3.20 We have ^ k kx[a^kx^|3] = x^k^[r a / x l [ (3 / xj ] , which sums 

to ix([|3/xj L(3/x + 1J - [a/x] [a/x- 1]). 

3.21 If 1 0 n ^ 2 M < 1 0 n+1 , there are exactly n+ 1 such powers of 2, because 
there’s exactly one such k-digit power of 2 for each k. Therefore the answer 
is 1 + [Mlog2J. 

Note: The number of powers of 2 with leading digit l is more difficult, 
when l > 1; it’s (L nl °g 2 ~ lo S IJ “ L nl °g 2 - lo gd + 1 )J)- 

3.22 All terms are the same for n and n— 1 except the kth, where n = 2 k_1 q 
and q is odd; we have S n = S n _i + 1 and = T n _i + 2 k q. Hence Srt = n 
and = n(n + 1 ). 

3.23 X n = m 4=^ 2 m (m — 1 ) < n ^ 2 m ( m + 1 ) < £=^’ m 2 ~ m + \ < 

2n<m 2 + m+|- m — \ < y/2n < m + 

3.24 Let |3 = a/(a+ 1 ). Then the number of times the nonnegative integer 
m occurs in Spec(|3) is exactly one more than the number of times it occurs 
in Spec (a). Why? Because N ((3, n) = N(a, n) + n + 1 . 


3.25 Continuing the development in the text, if we could find a value of m 
such that K m ^ m, we could violate the stated inequality at n + 1 when 
n = 2m + 1 . (Also when n = 3m + 1 and n = 3m + 2.) But the existence of 
such an m = n' + 1 requires that 2K|^ n / / 2 j ^ n' or SK^n./ / 3 j ^ n', i.e., that 

K|n'/ 2 j L n '/2J or K Ln / /3J |n'/3J . 

Aha. This goes down further and further, implying that Kq ^ 0; but Kq = 1 . 

What we really want to prove is that K n is strictly greater than n, for 
all n > 0. In fact, it’s easy to prove this by induction, although it’s a stronger 
result than the one we couldn’t prove! 

(This exercise teaches an important lesson. It’s more an exercise about 
the nature of induction than about properties of the floor function.) 

3.26 Induction, using the stronger hypothesis 

< (q - 1)((_^L_) + _-|), forn^O. 

3.27 If D^ 1 = 2 m b — a, where a is 0 or 1, then D[ 1 3 | ra = 3 m b — a. 


“In trying to devise 
a proof by mathe- 
matical induction, 
you may fail for 
two opposite rea- 
sons. You may fail 
because you try to 
prove too much: 
Your P(n) is too 
heavy a burden. 

Yet you may also 
fail because you try 
to prove too little: 
Your P(n) is too 
weak a support. 

In general, you 
have to balance 
the statement of 
your theorem so 
that the support is 
just enough for the 
burden.” 

— G. Polya [297] 
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3.28 The key observation is that a n = m 2 implies a n+ 2k+i = (ttl + k) 2 + 
m— k and a n+ 2k+2 = (m+k) 2 +2m, for 0 ^ k ^ m; hence a n+ 2 m +i = (2m) 2 . 
The solution can be written in a nice form discovered by Carl Witty: 

Qn _i = 2 l + > when 2 l + l ^ n < 2 l+1 +1+1. 

3.29 D(a', L<xn.J) i s most the maximum of the right-hand side of 

s(ct', LnaJ ,V) = — s(a,n,v) + S — e — {0 or 1} — V +{0 or 1}. 

3.30 X n = a 2 " + cxT 2 " , by induction; and X n is an integer. 

This logic is seri- 3.31 Here’s an “elegant,” “impressive” proof that gives no clue about how 
ously floored. it was discovered: 

W + LyJ + L x + yJ = L x + LyJ J + L x + yJ 

^ L x +i L 2 yJj + L x + 1 L 2 yJ + 1J 
= L 2x + L 2 yJJ = L 2x J + L 2 yJ • 

But there’s also a simple, graphical proof based on the observation that we 
need to consider only the case 0 ^ x, y < 1 . Then the functions look like this 
in the plane: 

A slightly stronger result is possible, namely 

M + LyJ + L x + yJ ^ L 2x l + L 2 yJ ; 

but this is stronger only when {x} = \- If we replace (x,y) by (— x,x + y) in 
this identity and apply the reflective law (3.4), we get 

LyJ + Lx + yJ + L 2x J ^ L X J + L 2x + 2yJ • 

3.32 Let f(x) be the sum in question. Since f(x) = f(— x), we may assume 
that x ^ 0. The terms are bounded by 2 k as k — > — 00 and by x 2 /2 k as 
k — > +00, so the sum exists for all real x. 

We have f(2x) = 2^ k 2 k ~ 1 ||x/2 k_1 1| 2 = 2f(x). Let f (x) = l(x) + r(x) 
where l(x) is the sum for k ^ 0 and r(x) is the sum for k > 0. Then l(x+ 1 ) = 
l(x), and l(x) ^ 1 /2 for all x. When 0 ^ x < 1 , we have r(x) = x 2 /2 + x 2 /4 + 
• • • = x 2 , and r(x + 1 ) = (x — 1 ) 2 /2 + (x + 1 ) 2 /4 + (x + 1 ) 2 /8 + • • • = x 2 + 1. 
Hence f (x + 1 ) = f (x) + 1 , when 0 ^ x < 1 . 
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We can now prove by induction that f (x + n) = f (x) + n for all integers 
n ^ 0, when 0 ^ x < 1. In particular, f(n) = n. Therefore in general, 

f(x) = 2- m f(2 m x) = 2- m |2 m xJ +2- m f({2 m x}). But f({2 m x}) = l({2 m x}) + 
r({2 m x}) </ \ + 1; so |f (x) — x| 4 |2- m [2 m xJ -x| + 2- m -§ <C 2 ^ for all 

integers m. 

The inescapable conclusion is that f(x) = |x| for all real x. 

3.33 Let r = n— \ be the radius of the circle, (a) There are 2n — 1 horizontal 
lines and 2n— 1 vertical lines between cells of the board, and the circle crosses 
each of these lines twice. Since r 2 is not an integer, the Pythagorean theorem 
tells us that the circle doesn’t pass through the corner of any cell. Hence 
the circle passes through as many cells as there are crossing points, namely 
8n — 4 = 8r. (The same formula gives the number of cells at the edge of the 
board.) (b) f(n,k) = 4[Vr 2 — k 2 J . 

It follows from (a) and (b) that 

\m 2 - 2r ^ Y_ lV r2 ~ k 2 J ^ \ nrl > r = n — 

0<k<r 

The task of obtaining more precise estimates of this sum is a famous problem 
in number theory, investigated by Gauss and many others; see Dickson [78, 
volume 2, chapter 6]. 

3.34 (a) Let m = [lgn] . We can add 2 m — n terms to simplify the calcula- 
tions at the boundary: 


f(n) + (2 m — n)m = ^[lgk] = Y = fig Tc~| ] [1 ^k^2 m ] 

k=1 j,k 

= ^j[2H<k<2)][Kj^m] 

),k 

m 

= Y_ j 2 i_1 = 2 m (m — 1 ) + 1 . 

j=i 

Consequently f (n) = nm — 2 m + 1 . 

(b) We have [n/2] = |_(tl + 1 )/2J , and it follows that the solution to the 
general recurrence g(n) = a(n) + g( [n/2]) + g([n/2J) must satisfy Ag(n) = 
Aa(n)+Ag([n/2J). In particular, when a(n) = n— 1, Af(n) = 1 +Af([n/2J) 
is satisfied by the number of bits in the binary representation of n, namely 
[lg(n + 1 )]. Now convert from A to L. 

A more direct solution can be based on the identities [lg 2j] = [lg j] + 1 
and flg(2j - 1 )] = [lg j] + [j > 1 ], for j ^ 1. 
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This is really only a 
level 4 problem, in 
spite of the way it’s 
stated. 


3.35 (n + 1 ) 2 u! e = A n + (n + 1 ) 2 + (n + 1 ) + B n , where 

1 ) 2 n! 


A _ (n + 1 ) 2 n! (n + 1) 2 n! , (n 

An — — 1 r • • • “h 


0 ! 


is a multiple of n and 


1! 


n — 1 1 


D _ (n + 1 ) 2 n! ( (n + 1 ) 2 n! ( 
n (n + 2)1 + (n + 3)! + ' 


n+ 1 


1 


1 


< 


n + 2 \ n + 3 (n + 3)(n + 4) 
n + 1 / 1 1 

n + 2\ + n + 3 + (n + 3)(n + 3) 
(n + 1 ) (n + 3) 

= (n + 2) 2 

is less than 1 . Hence the answer is 2 mod n. 

3.36 The sum is 


+ 

+ 


2- l 4- m [m = LI« IJ ] [l = LI« kj ] [1 < lc < 2 2 " ] 

k,l,m 

= Y_ 2 -l 4- m [2 m ^l<2 m+1 ][2 l ^k<2 l+1 ][0^m<n] 

k, l,m 

= ^4- m [2 m ^l<2 m+1 ][0^rn<n] 

l, m 

= ^2- m [0^m<n] = 2(1 — 2~ n ) . 

m 

3.37 First consider the case m < n, which breaks into subcases based on 
whether m < ^n; then show that both sides change in the same way when 
m is increased by n. 

3.38 At most one Xk can be noninteger. Discard all integer Xk, and suppose 
that n are left. When {x} ^ 0, the average of (mx) as m — > oo lies between ^ 
and j] hence {mxi }+•••+ {mx n } — {mxi + • • • + mx n } cannot have average 
value zero when n > 1 . 

But the argument just given relies on a difficult theorem about uniform 
distribution. An elementary proof is possible, sketched here for n = 2: Let 
P m be the point ({mx},{m'y}). Divide the unit square 0 ^ x,y < 1 into 
triangular regions A and B according asx + y<l orx + p^l. We want to 
show that P m £ B for some m, if {x} and {y} are nonzero. If Pi £ B, we’re 
done. Otherwise there is a disk D of radius e > 0 centered at Pi such that 
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D C A. By Dirichlet’s box principle, the sequence Pi , . . . , Pn must contain 
two points with |Pk — Pj | < e and k > j, if N is large enough. 



It follows that Pk-j_i is within e of (1,1) — Pi ; hence Pk-j-i £ B. 

3.39 Replace ) by b — j and add the term j = 0 to the sum, so that exercise 
15 can be used for the sum on j. The result, 

|"x/b k ] — |~x/b k+1 ] + b — 1 , 

telescopes when summed on k. 

3.40 Let L2\Ai| = 4k + r where —2 <; r < 2, and let m = [yTi] • Then the 
following relationships can be proved by induction: 


segment 

r 

m 

X 

y 

if and only if 

W k 

-2 

2k— 1 

m(m+l ) — n — k 

k 

(2k— 1 )(2k— 1 ) <C ns: (2k— 1 )(2k) 

s k 

-1 

2k- 1 

-k 

m(m+l ) — n + k 

(2k— l)(2k) <n< (2k) (2k) 

Ek 

0 

2k 

n — m(m+l ) + k 

-k 

(2k) (2k) (2k)(2k+l) 

Nk 

1 

2k 

k 

n — m(m+l ) — k 

(2k)(2k+1) < n < (2k+l)(2k+l) 


Thus, when k ^ 1 , Wk is a segment of length 2k where the path travels west 
and y (n) = k; Sk is a segment of length 2k — 2 where the path travels south 
and x(n) = — k; etc. (a) The desired formula is therefore 

y(n) = (— 1 ) m ^(n — m(m+ 1 )) • [L2v / hJ is odd] — [ . 

(b) On all segments, k = max(|x(n)|, |y (n)|). On segments Wk and Sk we 
have x < y and n + x + y = m(m + 1 ) = (2k) 2 — 2k; on segments Ek and Nk 
we have x ^ y and n — x — y = m(m + 1 ) = (2k) 2 + 2k. Hence the sign is 

)(x(n)<y(n))_ 

3.41 Since 1/4> + 1/cf) 2 = 1, the stated sequences do partition the positive 
integers. Since the condition g(n) = f (f(n)) + 1 determines f and g uniquely, 
we need only show that [Ln4>Jc(>J + 1 = L TL( t )2 J for all n > 0. This follows 
from exercise 3, with oc = 4> and u = 1 . 
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Too easy. 


3.42 No; an argument like the analysis of the two-spectrum case in the text 
and in exercise 13 shows that a tripartition occurs if and only if l/cx + 1/(3 + 
1/y = 1 and 



+1 


for all n > 0. But the average value of { (n + 1 ) /a} is 1 /2 if a is irrational, by 
the theorem on uniform distribution. The parameters can’t all be rational, 
and if y — m/n the average is 3/2 — 1 / (2n). Hence y must be an integer, but 
this doesn’t work either. (There’s also a proof of impossibility that uses only 
simple principles, without the theorem on uniform distribution; see [155].) 

3.43 One step of unfolding the recurrence for K n gives the minimum of the 
four numbers 1 + a + a- b • Ki ( n _i_ a )/( a .t>)j , where a and b are each 2 or 3. 
(This simplification involves an application of (3.11) to remove floors within 
floors, together with the identity x + min(y , z) = min(x + y,x + z). We must 
omit terms with negative subscripts; i.e., with n — 1 — a < 0.) 

Continuing along such lines now leads to the following interpretation: 
K n is the least number > n in the multiset S of all numbers of the form 


1 + ai + ai a 2 + ai a 2 a 3 -I b ai a 2 a 3 . . . a m , 

where m^O and each is 2 or 3. Thus, 

S = {1,3,4,7,9,10,13,15,19,21,22,27,28,31,31,...}; 

the number 31 is in S “twice” because it has two representations 1 + 2 + 4 + 
8 + 16 = 1+ 3 + 9 + 18. (Incidentally, Michael Fredman [134] has shown that 
lim n ^ 00 K n /n = 1, i.e., that S has no enormous gaps.) 

3.44 Let dn q1 = Dj^ mumble (q— 1), so that Dn 1 == (qDj^ +dn“^ )/(q — 1 ) 

and Q„ 1 = rD^_V(q -1)1- Now Dj c q _ ) 1 (q - l)n <==> 5} n, and the 

results follow. (This is the solution found by Euler [116], who determined the 
q’s and d’s sequentially without realizing that a single sequence D,/ 1 1 would 
suffice.) 

3.45 Let a > 1 satisfy a+l/a = 2m. Then we find 2Y n — a 2 + or 2 , and 
it follows that Y n = [a 2 /2] . 

3.46 The hint follows from (3. 9), since 2n(n+ 1 ) = [2(n+2) 2 J. Letn+0 = 

(\/2 + \fl )m and n' + 0' = {\[2. + + \fl )m, where 0 0,0' < 1. 
Then 0 ' = 20 mod 1 = 20 — d, where d is 0 or 1 . We want to prove that 
n' = [\/2(n + t)J ; this equality holds if and only if 

0 Y 0 7 (2 — \Jl ) + \/2(l -d) < 2. 
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To solve the recurrence, note that Spec ( 1 + 1 / \fl ) and Spec(l + \/2 ) partition 
the positive integers; hence any positive integer a can be written uniquely in 
the form a = [(\/2 + \fl ) mj , where l and m are integers with m odd 
and 1^0. It follows that L n = [{V2 + + \P- + )mj . 

3.47 (a) c = — j. (b) c is an integer, (c) c = 0. (d) c is arbitrary. See the 
answer to exercise 1.2.4-40 in [207] for more general results. 

3.48 Let x :0 = 1 and x : * k+1 ^ = x[x :k J; also let = {x :k } and = [x :k J, 
so that the stated identity reads x 3 = 3x :3 + 3ai a 2 + af — 3bi b 2 + bf . Since 
Qk + bk = x :k = xbk-i for k ^ 0, we have (1 — xz)(1 + bi z + b 2 Z 2 + ■ • •) = 
1 — di z — a 2 z 2 — • • • ; thus 

1 1 + bi z + b 2 Z 2 + • • • 

1 — XZ 1 — Qi Z — 0-2Z 2 — • • • 

Take the logarithm of both sides, to separate the a’s from the b’s. Then 
differentiate with respect to z, obtaining 

x qi + 2 a. 2 Z + 3ci3Z 2 bi + 2 b 2 Z + ib^z 2 H 

1 — XZ 1 — Qi Z — 02 z 2 — • • • 1 + bi z + b 2 Z 2 + • • • 

The coefficient of z n ~' on the left is x n ; on the right it is a formula that 
matches the given identity when n = 3. 

Similar identities for the more general product xqXi . . . x n _i can also 
be derived [170]. 

3.49 (Solution by Heinrich Rolletschek.) We can replace (a, |3) by ({(3}, 
a + L|3J) without changing [naj + L n PJ- Hence the condition a = {|3} is 
necessary. It is also sufficient: Let m = [|3J be the least element of the given 
multiset, and let S be the multiset obtained from the given one by subtracting 
mn from the nth smallest element, for all n. If a = {(3}, consecutive elements 
of S differ by either 0 or 2, hence the multiset = Spec (a) determines a. 

3.50 According to unpublished notes of William A. Veech, it is sufficient to 
have a|3, (3, and 1 linearly independent over the rationals. 

3.51 H. S. Wilf observes that the functional equation f (x 2 — 1 ) = f(x) 2 would 
determine f(x) for all x 4) if we knew f(x) on any interval (c)> . . (j) + e). 

3.52 There are infinitely many ways to partition the positive integers into 
three or more generalized spectra with irrational ctk! for example, 

Spec(2a; 0) U Spec(4a; —a) U Spec(4a; —3a) U Spec(|3; 0) 

works. But there’s a precise sense in which all such partitions arise by “ex- 
panding” a basic one, Spec(a) U Spec(|3); see [158]. The only known rational 


A more interesting 
(still unsolved) 
problem: Restrict 
both oc and |3 to 
be < 1 , and ask 
when the given 
multiset determines 
the unordered 
pair {oc, (3). 
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examples, e.g., 

Spec(7; —3) U Spec(|; — 1 ) U Spec(|; 0) , 

are based on parameters like those in the stated conjecture, which is due to 
A. S. Fraenkel [128]. 

3.53 Partial results are discussed in [95, pages 30-31]. The greedy algorithm 
probably does not terminate. 

4.1 1, 2, 4, 6, 16, 12. 

4.2 Note that m p + n p = min(m p ,n p ) +max(m p ,n p ). The recurrence 
lcm (m, n) = (n/(n mod m)) lcm (n mod m, m) is valid but not really advis- 
able for computing lcm’s; the best way known to compute lcmfm, n) is to 
compute gcd(m, n) first and then to divide ran by the gcd. 

4.3 This holds if x is an integer, but 7t(x) is defined for all real x. The 
correct formula, 

7t(x) — 7t(x — 1 ) = [[x] is prime] , 


is easy to verify. 

4.4 Between y and we’d have a left-right reflected Stern-Brocot tree 
with all denominators negated, etc. So the result is all fractions m/n with 
m _L n. The condition m'n— mn' = 1 still holds throughout the construction. 
(This is called the Stern-Brocot wreath , because we can conveniently regard 
the final y as identical to the first y, thereby joining the trees in a cycle at 
the top. The Stern-Brocot wreath has interesting applications to computer 
graphics because it represents all rational directions in the plane.) 

4.5 L k = (J k ) and R k = (2°); this holds even when k < 0. (We will find 
a general formula for any product of L’s and R’s in Chapter 6.) 


After all, ‘mod y’ 
sort of means “pre- 
tend y is zero.” So if 
it already is, there’s 
nothing to pretend. 


4.6 a = b. (Chapter 3 defined x mod 0 = x, primarily so that this would 
be true.) 

4.7 We need m mod 1 0 = 0, m mod 9 = k, and m mod 8 = 1. But m can’t 
be both even and odd. 


4.8 We want 10x + 6y = 10x + y (mod 15); hence 5y = 0 (mod 15); hence 

y = 0 (mod 3). We must have y = 0 or 3, and x = 0 or 1 . 

4.9 3 2k+1 mod 4 = 3, so (3 2k+1 — 1)/2 is odd. The stated number is 

divisible by (3 7 — 1 )/2 and (3 1 1 — 1 )/2 (and by other numbers). 

4.10 999(1 — y)(1 — yy) = 648. 
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4.11 cr(0) = 1; cr( 1 ) = —1; cr(n) = 0 for n > 1. (Generalized Mobius 

functions defined on arbitrary partially ordered structures have interesting 
and important properties, first explored by Weisner [366] and developed by 
many other people, notably Gian-Carlo Rota [313].) 

4 - 12 Ld\m Lk\d M-( d/k) g(k) = Lk\m Ld\(m/k) H(d) g(k) = L kXm g(k)x 
[m/k = 1] = g(m), by (4.7) and (4.9). 

4.13 (a) Up ^ 1 for all p; (b) pfn) 7^ 0. 

4.14 True when k > 0. Use (4.12), (4.14), and (4.15). 

4.15 No. For example, e n mod 5 = [2 or 3]; e n mod 11 = [2, 3, 7, or 10]. 

4.16 1/ei + l/e2 + -- - + l/e n = l — l/(e n (e n — 1 )) =1 — 1/(e n +i — 1 ). 

4.17 We have f n mod f m = 2; hence gcd(f n ,f m ) = gcd(2,f m ) = 1. (Inci- 
dentally, the relation f n = fof 1 . . . f n -i + 2 is very similar to the recurrence 
that defines the Euclid numbers e n .) 

4.18 If n = qm and q is odd, 2 n + 1 = (2 m + 1 )(2 n - m -2 n - 2m + 2 m + 1). 

4.19 The first sum is 7t(n), since the summand is [k+ 1 is prime]. The inner 
sum in the second is £i<tc<m^\ ra ]| so ^ * s g rea t er than 1 if and only if 
m is composite; again we get 7t(n). Finally [{m/n}] = [n\m], so the third 
sum is an application of Wilson’s theorem. To evaluate 7t(n) by any of these 
formulas is, of course, sheer lunacy. 

4.20 Let pi = 2 and let p n be the smallest prime greater than 2 Pn ~ 1 . Then 
2Pn-i < p n < 2 Pn ~' +1 , and it follows that we can take b = limr^oo lg |n| p n 
where lg ,n ' is the function lg iterated n times. The stated numerical value 
comes from P2 = 5, P3 = 37. It turns out that P4 = 2 37 + 9, and this gives 
the more precise value 

b « 1.2516475977905 

(but no clue about P5). 

4.21 By Bertrand’s postulate, P n < 10 n . Let 
K = XlO^Pk = .200300005... . 

k^1 

Then 10 n2 K = P n + fraction (mod TO 211 - 1 ). 

4.22 (b mn - 1 )/(b - 1 ) = ((b m - 1 )/(b - 1 )) (b mR - m + • • • + 1 ). [The only 
prime numbers of the form (10 p — 1)/9 for p < 20000 occur when p = 2, 19, 
23, 317, 1031.] Numbers of this form are called “repunits.” 
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4.23 p(2k+ 1) = 0; p(2k) = p(k) + 1, for k ^ 1. By induction we can show 
that p(n) = p(n — 2 m ), if n > 2 m and m > p(n). The kth Hanoi move is disk 
p(k), if we number the disks 0, 1 , . . . , n — 1 . This is clear if k is a power of 2. 
And if 2 m < k < 2 m+1 , we have p(k) < m; moves k and k — 2 m correspond 
in the sequence that transfers m + 1 disks in T m + 1 + T m steps. 

4.24 The digit that contributes dp m to n contributes dp m_1 + • • • + d = 
d(p m — 1 )/(p - 1) to e p (n!), hence e p (n!) = (n-v p (n))/(p - 1). 

4.25 m\\n ■£=£> m p = 0 or m p = n p , for all p. It follows that (a) is true. 
But (b) fails, in our favorite example m = 12, n = 18. (This is a common 
fallacy.) 

4.26 Yes, since Sn defines a subtree of the Stern-Brocot tree. 

4.27 Extend the shorter string with M’s (since M lies alphabetically be- 
tween L and R) until both strings are the same length, then use dictionary 
order. For example, the topmost levels of the tree are LL < LM < LR < 
MM < RL < RM < RR. (Another solution is to append the infinite string 
RL°° to both inputs, and to keep comparing until finding L < R.) 

4.28 We need to use only the first part of the representation: 

RRRLLLLLLLRRRR R R 

12347101316192225476991 113 135 

1 > 1 > 1 > 1 > 2 > 3 ’ 4 ’ 5 > 6 ’ 7 > 8 ’ 15 ’ 22 ’ 29 > 36 > 43 > ' ' ‘ ' 

The fraction y appears because it’s a better upper bound than ' , not because 
it’s closer than y. Similarly, ^ is a better lower bound than y. The simplest 
upper bounds and the simplest lower bounds all appear, but the next really 
good approximation doesn’t occur until just before the string of R’s switches 
back to L. 

4.29 1 /a. To get 1 — x from x in binary notation, we interchange 0 and 1 ; to 
get 1 /a from oc in Stern-Brocot notation, we interchange L and R. (The finite 
cases must also be considered, but they must work since the correspondence 
is order preserving.) 

4.30 The m integers x £ [A . . A + m) are different mod m; so their residues 
(x mod mi,...,x mod m r ) run through all mi . . . m r = m possible values, 
one of which must be equal to ( ai mod mi , . . . , a T mod m r ) by the pigeonhole 
principle. 

4.31 A number in radix b notation is divisible by d if and only if the sum 

of its digits is divisible by d, whenever b = 1 (mod d). This follows because 
(a m . . . Qo)b = a m b m H b a 0 b° = a m H b a 0 . 

4.32 The cp(m) numbers { kn. mod m | k 1 m and 0 ^ k < m} are the num- 
bers {k | k _L m and 0 k < m} in some order. Multiply them together and 
divide by Ilo $ k<m, klm 
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4.33 Obviously h( 1 ) = 1 . If m _L n then h.(mn) = Hd\mn f(d) g(mn/d) = 

L c \m,d\n f ( cd ) 9 ( ( rrv/ c)(n/ d)) = I c \mld\n f ( c ) g(m/c)f(d) g(n/d); this 
is h(m) h(n), since c _L d for every term in the sum. 

4.34 g(m) = L d \m f ( d ) = Id\m f ( m / d ) = Ld^i f ( m / d ) if f(x) is zero 
when x is not an integer. 

4.35 The base cases are 


1(0, n) = 0 ; I(m, 0) = 1 . 

When m, n > 0, there are two rules, where the first is trivial if m > n and 
the second is trivial if m < n: 


I(m, n) = I(m, n mod m) — Ltl/ttlJ I(n mod m, m) ; 
I(m, n) = I(mmodn,n), 


4.36 A factorization of any of the given quantities into nonunits must have 
m 2 — 10n 2 = ±2 or ±3, but this is impossible mod 10. 

4.37 Let a n = 2~ n ln(e n — j) and b n = 2~ n ln(e n + j). Then 

e n = [E 2 " + \\ ■$=$ a n ^ InE < b n . 


And a n _i < a n < b n < b n _i, so we can take E = limn^^oo e aTl . In fact, it 
turns out that 


E 2 = 



i y /2 
(2e n -l) 2 J 


a product that converges rapidly to (1 .264084735305301 11 . . . ) 2 . But these 
observations don’t tell us what e n is, unless we can find another expression 
for E that doesn’t depend on Euclid numbers. 

4.38 Let r = n mod m. Then -b n = (a m -b m )(a n - m b° + a n - 2m b m + 
■ ■ ■ + a r b n ~ m ~ r ) + b m L n / m J ( a r - b r ). 

4.39 If ai ... at and bi . . . b u are perfect squares, so is 
Qi ... a t bi . . .b u /c 2 . . .c 2 , 


where {ai , . . . , at}H{bi , . . . , b u } = {ci , . . . , c v }. (It can be shown, in fact, that 
the sequence (S ( 1 ) , S(2), S(3), . . . , } contains every nonprime positive integer 
exactly once.) 
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John .316 


banner displayed 
during the 1993 
World Series, 
when John Kruk 
came to bat. 


4.40 Let f(n) = Ili s :k : cTi,pXk k = n! /P [ri/pJ L n /Pj ! and 9( n ) = n!/p e * (n!) . 

Then 

g(n) = f(n)f(Ln/pJ)f(Ln/p 2 j) ... = f(n)g(|n/pj) . 

Also f(n) = a 0 !(p — 1 ) ! L n -/'PJ = q 0 !(— 1 (mod p), and e p (n!) = [rt/pj + 

e p ( [n/pj !) . These recurrences make it easy to prove the result by induction. 
(Several other solutions are possible.) 

4.41 (a) If n 2 = —1 (mod p) then (n 2 )^ p_1 ^ 2 = —1; but Fermat says it’s 

+ 1. (b) Let n = ((p — 1 )/2) !; we have n= (-l)tp- 1 )/ 2 n is , k<p/2 ( p -k.) = 
(p — 1 )!/n, hence n 2 = (p — 1 )!. 

4.42 First we observe that k _L l •£=£• k _L l + ak for any integer a, since 
gcd(k, l) = gcd(k, l + ak) by Euclid’s algorithm. Now 

m _L n and n' _L n <!==? mn' _L n 

mn' + nm' _L n . 

Similarly 

m/ _L n' and n _L n' <£=4> mn' + nm' _L n' . 

Hence 

min and m/ _L n' and n _L n' <£=4> rnn'+um/ J_ nn' . 

4.43 We want to multiply by L _1 R, then by R _1 L _1 RL, then L _1 R, then 
R~ 2 L _1 RL 2 , etc.; the nth multiplier is R-p^L - 1 RL p ^ n ', since we must cancel 
p(n) R’s. And R _m L _1 RL m = (° 2m ^). 

4.44 We can find the simplest rational number that lies in 

[0.3155. .0.3165) = 

by looking at the Stern-Brocot representations of and and stopping 
just before the former has L where the latter has R: 

(mi ,Hi ,m2,H2) := (631,2000,633,2000); 

while mi > ui or m 2 < U 2 do 

if m 2 < U 2 then (output(L); (ui,n 2 ) := (ni ,n 2 ) — (mi , m 2 )) 

else (output(R); (mi, m 2 ) := (mi , m 2 ) — (ni , m)) . 

The output is LLLRRRRR = ^ « .3158. Incidentally, an average of .334 
implies at least 287 at bats. 
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4.45 x 2 = x (mod 10 n ) 4=4- x(x — 1) = 0 (mod 2 n ) and x(x — 1) = 0 

(mod 5 n ) x mod 2 n = [Oor 1] and x mod 5 n = [Oor 1], (The last step 

is justified because x(x — 1 ) mod 5 = 0 implies that either x or x — 1 is a 
multiple of 5, in which case the other factor is relatively prime to 5 n and can 
be divided from the congruence.) 

So there are at most four solutions, of which two (x = 0 and x = 1) 
don’t qualify for the title “n-digit number” unless n = 1 . The other two 
solutions have the forms x and 1 0 n + 1 — x, and at least one of these numbers 
is 10 n_1 . When n = 4 the other solution, 10001 — 9376 = 625, is not a 
four-digit number. We expect to get two n-digit solutions for about 90% of 
all n, but this conjecture has not been proved. 

(Such self-reproducing numbers have been called “automorphic.”) 

4.46 (a) If )') — k'k = gcd(j,k), we have u k k n gcd|,,k * = n? ’ = 1 and 
n k k = 1. (b) Let n = pq, where p is the smallest prime divisor of n. If 
2 n = 1 (mod n) then 2 n = 1 (mod p). Also 2 P ~ 1 = 1 (mod p); hence 
2g cd (p-bn) = ] (mod p). But gcd(p — 1,n) = 1 by the definition of p. 

4.47 If n m ~ 1 = 1 (mod m) we must have n _L m. If n k = nf for some 
1 ^ j < k < m, then n k ~^ = 1 because we can divide by nk Therefore if the 
numbers n 1 mod m, . . . , n m_1 mod m. are not distinct, there is a k < m — 1 
with n k = 1 . The least such k divides m— 1 , by exercise 46(a). But then kq = 
(m— l)/p for some prime p and some positive integer q; this is impossible, 
since n kq ^ 1. Therefore the numbers n 1 mod m, ..., n m_1 mod m are 
distinct and relatively prime to m. Therefore the numbers 1 , . . . , m — 1 are 
relatively prime to m, and m must be prime. 

4.48 By pairing numbers up with their inverses, we can reduce the product 
(mod m) to ]~[i<n<m n 2 mod m=i n - Now we can use our knowledge of the 
solutions to n. 2 mod m = 1 . By residue arithmetic we find that the result is 
m — 1 if m = 4, p k , or 2p k (p > 2); otherwise it’s +1. 

4.49 (a) Either m < n (®(N — 1) cases) or m = n (one case) or m > n 
(®(N — 1) again). Hence R(N) =2®(N — 1) + 1. (b) From (4.62) we get 

2<D(N — 1) + 1 = 1 + Y_ h(d)LN/dJLN/d-lJ ; 

d^1 

hence the stated result holds if and only if 
Y_ p(d)|N/dJ = 1 , for N ^ 1. 

d>1 

And this is a special case of (4.61) if we set f(x) = [x^ 1]. 
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“Die ganzen Zahlen 
hat der liebe Gott 
gemacht, alles 
andere ist 
Menschenwerk. ” 

— L. Kronecker [365] 


4.50 (a) If f is any function, 

L w = L L f(k) [d = gcd(k, m)] 

O^kcm d\m O^kcm 

= Z L f(k) [k/d_L m/d] 

d\m 0^k<m 

= L L f(kd) [klm/d] 

d\m 0^k<m/d 

= L L f(km/d) [k_L d] ; 

d\m 0^k<d 

we saw a special case of this in the derivation of (4.63). An analogous deriva- 
tion holds for ]~[ instead of Thus we have 

z m -*\ = n (z-“> k ) = n n 

O^kcm d\m 0^k<d d\m 

k_Ld 

because u> m/d = e 2m/d . 

Part (b) follows from part (a) by the analog of (4.56) for products 
instead of sums. Incidentally, this formula shows that T m (z) has integer 
coefficients, since T m (z) is obtained by multiplying and dividing polynomials 
whose leading coefficient is 1 . 

4.51 (xi + bx n ) p = X kl+ .,. +kn=p p!/(k 1 !...k n !)x 1 ( 1 and the 

coefficient is divisible by p unless some kj = p. Hence (xi + ■ ■ ■ + x n ) p = 
x p + • • • + Xn (mod p). Now we can set all the x’s to 1 , obtaining n p = n. 

4.52 If p > n there is nothing to prove. Otherwise x _L p, so x k(p_1 ' = 1 
(mod p); this means that at least [(n— 1 )/ (p — 1 ) J of the given numbers are 
multiples of p. And (n — 1 )/(p — 1 ) n/p since n ^ p. 

4.53 First show that if m 6 and m is not prime then (m— 2)1 = 0 (mod m). 
(If m = p 2 , the product for (m — 2)! includes p and 2p; otherwise it includes 
d and m/d where d < m/d.) Next consider cases: 

Case 0, n < 5. The condition holds for n = 1 only. 

Case 1, n. 5? 5 and n is prime. Then (n — 1 )!/(n + 1 ) is an integer and 
it can’t be a multiple of n. 

Case 2, n jpt 5, n is composite, and n + 1 is composite. Then n. and 
n + 1 divide (n — 1 )!, and nln+ 1; hence n(n + 1 )\(n — 1 )!. 

Case 3, n 5, n. is composite, and n + 1 is prime. Then (n — 1 )! = 1 
(mod n + 1) by Wilson’s theorem, and 


_(n — 1 )!/(n + 1 )J = ((n — 1)! + n)/(n+ 1) ; 
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this is divisible by n. 

Therefore the answer is: Either n = 1 or n 7^ 4 is composite. 

4.54 £2(1000!) > 500 and £5 (1000!) =249, hence 1000! = a-10 249 for some 
even integer a. Since 1000 = (1300)5, exercise 40 tells us that a • 2 249 = 
1 000! /5 249 = —1 (mod 5). Also 2 249 = 2, hence a = 2, hence a mod 10 = 2 
or 7; hence the answer is 2- 10 249 . 

4.55 One way is to prove by induction that P2n/Pn( n + 1) is an integer; 
this stronger result helps the induction go through. Another way is based 
on showing that each prime p divides the numerator at least as often as it 
divides the denominator. This reduces to proving the inequality 

2 n n 

Y_ LVraJ ^ 4 Y_ Llc/m.J , 

k=1 k=1 

which follows from 

[(2n— 1)/mJ + [2n/mJ ^ L n / m J • 

The latter is true when 0 ^ n < m, and both sides increase by 4 when n is 
increased by m. 

4.56 Let f(m) = Hk^T 1 niin(k,2n— k)[m\k], g(m) = X!k=i (2n— 2k— 1) x 
[m\(2k+ 1)]. The number of times p divides the numerator of the stated 
product is f(p) + f (p 2 ) + f(p 3 ) +■*■•, and the number of times p divides the 
denominator is g(p) + g(p 2 ) + g(p 3 ) H — • . But f(m) = g(m) whenever m is 
odd, by exercise 2.32. The stated product therefore reduces to 2 n,n ~ 1 ', by 
exercise 3.22. 

4.57 The hint suggests a standard interchange of summation, since 

[d\m] = ^ [m=dk] = [n./ d.J . 

1 <)m^n 0<k<:n/d 

Calling the hinted sum Z(n), we have 

I(m + n) — I(m) — I(n) = ^ cp ( d) . 

d£S(m,n) 

On the other hand, we know from (4.54) that L(n) = tTT-(tl + 1). Hence 
I(m + n) — I(m) — I(n) = mn. 

4.58 The function f(m) is multiplicative, and when m = p k it equals 1 + 
p + • • ■ + p k . This is a power of 2 if and only if p is a Mersenne prime and 
k = 1 . For k must be odd, and in that case the sum is 

(1 +p)(l +p 2 +p 4 + ---+p k - 1 ) 
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and (k — 1 )/2 must be odd, etc. The necessary and sufficient condition is that 
m be a product of distinct Mersenne primes. 

4.59 Proof of the hint: If n = 1 we have xi = a = 2, so there’s no problem. 
If n > 1 we can assume that X] ^ ^ x n . Case 1: x ^ 1 + • • • + x~l-| + 

(x n — 1 ) _1 1 and x n > x n _i . Then we can find |3 Js x n — 1 ^ x n _i such 

that x ^ 1 + ■ ■ ■ + x n 1 1 + (3 _1 = 1; hence x n ^ (3 + 1 ^ e n and xi . . ,x n T 
X] . . . x n _i ( (3 + 1 ) ^ ei . . . e n , by induction. There is a positive integer m 
such that a = Xi . . . x n /m; hence a ^ ei . . . e n = e n+ i — 1 , and we have 

xi . . ,x n (a+ 1) ^ ei . . . e n e n+ i . Case 2: xf 1 H b x^l, + (x n - I ) -1 ^ 1 

and x n = x n _i . Let a = x n and cC 1 + (a — 1 ) _1 = (a — 2 ) _1 + £ _1 • Then 
we can show that a )> 4 and (a — 2)(C + 1 ) ^ a 2 . So there’s a (3 )> C such 
that x ^ 1 + • • • + x~I 2 + ( a. — 2) _1 + |3 _1 = 1 ; it follows by induction that 
Xi ...x n ^ Xi . ..x n _ 2 (a- 2 )(C + 1 )^X 1 . ..x n _ 2 (a-2)((3 + 1) ^ ei ...e n , 
and we can finish as before. Case 3: x , -1 + ■ • • + x n 1 , + (x n -l )- 1 < 1 . 
Let a = x n , and let a -1 + a -1 = (a — I ) -1 + |3 _1 . It can be shown that 
(a — 1)(|3 + 1) > a(a + 1), because this identity is equivalent to 

aoc 2 — a 2 oc + act - a 2 + a + a > 0 , 


“Man made 
the integers: 

All else is 
Dieudonne." 

— R.K. Guy 


4.61 Let m and n be the right-hand sides; observe that ran/ — m/n = 1, 
hence m _L n. Also m/n > m'/n' and N = ((n + N)/n')n' — n ^ n > 
((n + N)/n' — l)n' — n = N — n' ^ 0. So we have m/n ^ m"/n". If equality 
doesn’t hold, we have n" = (mn' — m'n)n" = n'(mn" — m"n) + n[tn"n' — 
m'n") 3? n' + ft > N, a contradiction. 


which is a consequence of aa(a— a) + (1 + a) a )> (1 + a) a > a 2 — a. Hence 
we can replace x n and a. by a — 1 and (3 , repeating this transformation until 
cases 1 or 2 apply. 

Another consequence of the hint is that 1 /xi + • • • + 1 /x n < 1 implies 
1 /xi + ■ • • + 1 /x n ^ 1 /ei H — • + 1 /e n ; see exercise 16. 

4.60 The main point is that 0<f. Then we can take pi sufficiently large 
(to meet the conditions below) and p n to be the least prime greater than 
p^_-|. With this definition let a n = 3 _n lnp n and b n = 3~ n ln(p n + 1). If 
we can show that a n _i ^ a n < b n ^ b n _j , we can take P = linin^oo e an as 
in exercise 37. But this hypothesis is equivalent to p 3 ^ p n < (p n -i + 1 ) 3 . 
If there’s no prime p n in this range, there must be a prime p < Pn- , such 
that p + cp 0 > (Pn-i + I) 3 - But this implies that cp e > 3p 2 / 3 , which is 
impossible when p is sufficiently large. 

We can almost certainly take p i = 2, since all available evidence indi- 
cates that the known bounds on gaps between primes are much weaker than 
the truth (see exercise 69). Then p 2 = 11, P 3 = 1361, P 4 = 2521008887, and 
1 .306377883863 < P < 1 .306377883869. 
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Incidentally, this exercise implies that (m + m")/(n + n") = m '/n/, 
although the former fraction is not always reduced. 


4.62 2- 1 +2- 2 +2~ 3 -2~ 


+ 2 - 


12 


+ 2~ 13 - 2 ~ 2U -2 


20 


-21 


-30 


2-31 _ 2 


-42 


-2 


-43 


+ 


can be written 


+ 3^(2 


— 4k— 6k— 3 


— 2 


— 4k — 1 Ok— 7\ 


k^O 


This sum, incidentally, can be expressed in closed form using the “theta func- 
tion” 0(z, A) = ^ k e-" Ak2 + 2izk ; we have 

e 1 + |0(^ln2, 3iln2) - T |g0(3 ; ln2, 5iln2) . 


4.63 Any n > 2 either has a prime divisor d or is divisible by d = 4. In 
either case, a solution with exponent n implies a solution (a n/d ) d + (b n / d ) d = 
(c n / d ] d w ith exponent d. Since d = 4 has no solutions, d must be prime. 

The hint follows from the binomial theorem, since (a p + (x— a) p )/x = 
pa p_1 (mod x) when p is odd. The smallest counterexample, if (4.46) fails, 
has a i x. If x is not divisible by p then x is relatively prime to c p /x; this 
means that whenever q is prime and q e \\x and q f \\c, we have e = fp. Hence 
x = m p for some m. On the other hand if x is divisible by p, then c p /x is 
divisible by p but not by p 2 , and c p has no other factors in common with x. 

4.64 Equal fractions in CPn appear in “organ-pipe order”: 

2m 4m rm 3m m 

2n ’ 4n’ m’ 3n’ n ' 

Suppose that Tn is correct; we want to prove that Tn + i is correct. This 
means that if kN is odd, we want to show that 

k- 1 

NTT = ?N ' kN; 

if kN is even, we want to show that 

k — 1 

,kN — 1 I'M .kN ^ — -j ™N,kN ^N^N + I • 

In both cases it will be helpful to know the number of fractions that are 
strictly less than (k — 1)/(N + 1)in Tn; this is 


III' 

n = 1 m 


r m k — 1 1 

N 

_ V 

1” (k — 1)nl 

N 

_ y 

(k — 1 )n + N 

L ^ n N + 1 J 

- 2_ 

n= 1 

N + 1 

_ 2_ 

n=0 

N + 1 


(k-2)N d — 1 , 

2 + — — + d 


/ have discovered a 
wonderful proof of 
Fermat’s Last Theo- 
rem, but there’s no 
room for it here. 


2 


2 
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by (3.32), where d = gcd(k — 1 , N + 1 ). And this reduces to j (kN — d + 1 ), 
since N mod d = d — 1 . 

Furthermore, the number of fractions equal to (k — 1 )/(N + 1 ) in CPn 
that should precede it in Jn+i is l(d— 1 — [d even]), by the nature of organ- 
pipe order. 

If kN is odd, then d is even and (k— 1 ]/(N + l ) is preceded by |(kN — 1 ) 
elements of Tm; this is just the correct number to make things work. If kN 
is even, then d is odd and (k — 1)/(N + 1) is preceded by ^(PN) elements 
of !Pn. If d = 1, none of these equals (k — 1 )/( N + 1] and IPn.IcN is 
otherwise (k — 1 )/(N + 1 ) falls between two equal elements and l?N,kN is ‘=’. 
(C. S. Peirce [288] independently discovered the Stern-Brocot tree at about 
the same time as he discovered CPn-) 


“No square less than 
25 x 10 14 divides a 
Euclid number.” 

— llan Vardi 


4.65 The analogous question for the (analogous) Fermat numbers f n is a 
famous unsolved problem. This one might be easier or harder. 

4.66 It is known that no square less than 36 x 10 18 divides a Mersenne 
number or Fermat number. But there has still been no proof of Schinzel’s 
conjecture that there exist infinitely many squarefree Mersenne numbers. It 
is not even known if there are infinitely many p such that p\\(a ± b), where 
all prime factors of a and b are ^ 31 . 


4.67 M. Szegedy has proved this conjecture for all large n; see [348], [95, 
pp. 78-79], and [55]. 


4.68 This is a much weaker conjecture than the result in the following ex- 
ercise. 


4.69 Cramer [66] showed that this conjecture is plausible on probabilistic 
grounds, and computational experience bears this out: Brent [37] has shown 
that P n +i — P n ^ 602 for P n+ i < 2.686 x 10 12 . But the much weaker bounds 
in exercise 60 are the best that have been published so far [255] . Exercise 68 
has a “yes” answer if P n+ i — P n < 2P n for all sufficiently large n. According 
to Guy [169, problem A8], Paul Erdos offers $10,000 for proof that there are 
infinitely many n such that 


Pn+i - Pn > 


c In n In In n In In In In u. 
(In In Inn) 2 


for all c > 0. 


4.70 This holds if and only if V2(n) = V3(n), according to exercise 24. The 
methods of [96] may help to crack this conjecture. 

4.71 When k = 3 the smallest solution is n = 4700063497 = 19-47-5263229; 
no other solutions are known in this case. 
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4.72 This is known to be true for infinitely many values of a, including —1 
(of course) and 0 (not so obviously). Lehmer [244] has a famous conjecture 
that cp(n)\(n — 1 ) if and only if n is prime. 

4.73 This is known to be equivalent to the Riemann hypothesis (that the 
complex zeta function C(z) is nonzero when the real part of z is greater 
than 1/2). 

4.74 Experimental evidence suggests that there are about p ( 1 — 1/e) dis- 
tinct values, just as if the factorials were randomly distributed modulo p. 

5.1 (II) 4 = (14641 ) T , in any number system of radix r j> 7, because of the 
binomial theorem. 

5.2 The ratio (j.^) / (£) = (n — k)/(k+ 1 ) is ^ 1 when k 5s Ln/2J and j> 1 
when k < [n/2], so the maximum occurs when k = [u/2J and k = pn/2]. 

5.3 Expand into factorials. Both products are equal to f(n)/f(n — k)f(k), 
where f(n) = (n + 1 )! n! (n — 1 )!. 

5.4 (- 1 ) = (-D^+i- 1 ) = (-1) k ( k ) = (— l) k [k^0]. 

5.5 If 0 < k < p, there’s a p in the numerator of (])) with nothing to 
cancel it in the denominator. Since ( p ) = ( p /') + ( p /]), we must have 
( p ) / 1 ) = (—1 ) k (mod p), for 0 ^ k < p. 

5.6 The crucial step (after second down) should be 



The original derivation forgot to include this extra term, which is [n = 0] . 

5.7 Yes, because r— = (—1 ) k /(— r — 1 )— . We also have 

r k (r+}) k = (2r)2V2 2k . 

5.8 f(k) = (k/n— 1 ) n is a polynomial of degree n whose leading coefficient 
is n~ n . By (5.40), the sum is n!/n n . When n is large, Stirling’s approxima- 
tion says that this is approximately \/27m/e n . (This is quite different from 
(1 — 1/e), which is what we get if we use the approximation (1 — k/n) n ~ e~ k , 
valid for fixed k as n — > 00.) 


What’s 11 4 
radix 1 1 ? 
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But not Imbesselian. 


Each value of a 
hypergeometric 
term t(k) can be 
written O e(lc) v(k), 
where e(k) is 
an integer and 
v(k) ^ 0. Sup- 
pose the term ratio 
t(k + 1 )/t(k) is 
P(k)/q(k), and 
that p and q have 
been completely 
factored over the 
complex numbers. 
Then, for each k, 
e(k + 1 ) is e(k) 
plus the number of 
zero factors of p(k) 
minus the number 
of zero factors of 
q(k) , and v(k + 1 ) 
is v(k) times the 
product of the 
nonzero factors 
of p(k) divided 
by the product of 
the nonzero factors 
of q(k) . 


5.9 = Li< s o t ( tk + t ) k 1 z Vk! = Lic > o( k + 1 ) k ’(tzJVlc! = £i(tz), 
by ( 5 . 60 ). 

5.10 Lk 5 : 0 2 zk /(l< + 2) = F(2, 1; 3; z), since t k+ i/t k = (k + 2)z/(k + 3). 

5.11 The first is Besselian and the second is Gaussian: 

z _1 smz = L k; > 0 (“1) kz 2 V(2k+l)! = F(1; 1 , f ; -z 2 /4) ; 

Z_1 arcsinz = L k; >o z 2 k (|) k /( 2 k + 1 )k! = F( 2 , 2 ; | ;z 2 ). 

5.12 (a) Yes, if n 7 ^ 0, since the term ratio is n. (b) Yes, when n is an 

integer; the term ratio is (k + IJVk-Y Notice that we get this term from 
( 5 . 115 ) by setting m = n + 1 , m = • • ■ = a m = 1 , bi = • • • = = 0 , z = 1 , 

and multiplying by 0 n . (c) Yes, the term ratio is (k+1)(k+3)/(k+2). (d) No, 
the term ratio is 1 + l/(k + 1 ) H k ; and H k ~ Ink isn’t a rational function, 
(e) Yes, the reciprocal of any hypergeometric term is a hypergeometric term. 
The fact that t(k) = 00 when k < 0 or k > n does not exclude t(k) from 
hypergeometric termhood. (f) Of course, (g) Not when, say, t(k) = 2 k and 
T(k) = 1 . (h) Yes; the term ratio t(n — 1 — k)/t(n — 1 — (k + 1 )) is a rational 
function (the reciprocal of the term ratio for t, with k replaced by n — 1 — k), 
for arbitrary n. (i) Yes; the term ratio can be written 

cl t(k+1 )/t(k) T b t(k+2) /t(k) -|- c t ( k— 1 — 3 ) /t(k) 
cl -P b t(k+l )/t(k) -Pc t(k+ 2 ) /t(k) 

and t(k + m)/t(k) = (t(k + m)/t(k + m — 1 )) . . . (t(k + 1 ) /t (k) ) is a rational 
function of k. (j) No. Whenever two rational functions Pi(k)/qi(k) and 
P 2 (k)/qi(k) are equal for infinitely many k, they are equal for all k, because 
Pi (k)q 2 (k) = 9i (k)p 2 (k) is a polynomial identity. Therefore the term ratio 
[~(k+ 1 )/2]/[k/2] would have to equal 1 if it were a rational function, (k) No. 
The term ratio would have to be (k + 1 ) /k, since it is (k + 1 )/k for all k > 0; 
but then t(— 1 ) can be zero only if t( 0 ) is a multiple of 0 2 , while t(l ) can be 1 
only if t( 0 ) = 0 1 . 

5.13 R n = nT + 1 /P 2 = Qn/Pn = Q^/nT +1 . 

5.14 The first factor in ( 5 . 25 ) is ( l _ l ^( k m ) when k st l, so it’s (— 1) l ~ k ~ m x 
(t-k-m)' The sum ^ or k l is the sum over all k, since m ^ 0. (The 
condition n ^ 0 isn’t really needed, although k must assume negative values 
if n < 0 .) 

To go from ( 5 . 25 ) to ( 5 . 26 ), first replace s by — 1 — n — q. 

5.15 If n is odd, the sum is zero, since we can replace k by n— k. If n = 2m, 
the sum is (—1 ) m (3m)!/m! 3 , by ( 5 . 29 ) with a = b = c = m. 
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5.16 This is just (2a)! (2b)! (2c)!/(a + b)! (b + c)! (c + a)! times ( 5 . 29 ), if we 
write the summands in terms of factorials. 

5.17 The formulas ( 2n ^ /2 ) = (^)/2 2 - and { 2n ^ /2 ) = (^)/2 4n yield 
( 2n-1/2 )=2 2n ( 2n-V2 ) _ 

5 - 18 (s 3 D( k 3 , k k )/ 33k 

5.19 ® 1 _t(-z ) _1 = Lk^o (TD(-V(k-tk- l))(-z) k , by ( 5 . 60 ), and 
this is ^I k2:0 ( t k )(l/(th-k+ l))z k = ® t (z). 

5.20 It equals F(— ai , . . . , — a m ; — bi , . . . , — b n ; (— 1 ) m+n z); see exercise 2.17. 

5.21 lim n ^ 00 (n+ m)— /n m = 1. 

5.22 Multiplying and dividing instances of ( 5 . 83 ) gives 


(- 1 / 2 )! 
x! (x- 1 / 2 )! 


lim 

TL — >00 


lim 

TL — >00 


n + x\ fn + x- 1 / 2 \ 2 * 


n 


n 


2 n + 2x 
In 


n 


-2x 


n 


n- 1/2 
n 


by (5-34) and ( 5 - 36 )- Also 

l/( 2 x)! = lim ( 2n + 2 x W)- 2 *. 
n >-oo Y 2n J 


Hence, etc. The Gamma function equivalent, incidentally, is 


r(x)r(x + i) - r(2x)r(i)/2 2x - 1 . 


5.23 (-1) n nj , see ( 5 . 50 ). 

5.24 This sum is (£) F( m_ 1 /) 2 _m | l) = (f£), by ( 5 . 35 ) and ( 5 . 93 ). 

5.25 This is equivalent to the easily proved identity 

a k (a + 1 ) k a k 

(a — b) = = a =— b — 

(b + 1 ) k (b + 1 ) k b k 

as well as to the operator formula a — b = (9 + a) — (-9 + b). 
Similarly, we have 


(dl 


F /ai,a 2 ,a 3 , ..., a m 

V bi , .. . , b n 

/ ai+1, a 2 , a 3 , . . . , a m 
V b ! , . . . , b n 


a 2 F 


ai 


1 ai + 1 . 
bi , . 


Q3, 



= ai F 
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Equating coeffi- 
cients of z n gives 
the Pfaff-Saalschiitz 
formula ( 5 . 97 ). 


because ai — Q 2 = (ai +k) — (a. 2 +k). If ai — bi is a nonnegative integer d, this 
second identity allows us to express F(cii , . . . , a m ; bi , . . . , b n ; z) as a linear 
combination of F(a .2 + j, 0.3, . . . , a m ; b 2 , . . . , b n ; z) for 0 < j < d, thereby 
eliminating an upper parameter and a lower parameter. Thus, for example, 
we get closed forms for Ffa, b; a — 1 ; z), F(a, b; a — 2; z), etc. 

Gauss [143, §7] derived analogous relations between F(a,b;c;z) and 
any two “contiguous” hypergeometrics in which a parameter has been changed 
by ±1. Rainville [301] generalized this to cases with more parameters. 

5.26 If the term ratio in the original hyper geometric series is tk+i /t^ = r(k), 
the term ratio in the new one is tk+ 2 /tk+i = r(k + 1 ). Hence 


F 


ai , . . . , a m 

bi, b n 


1 1 a i • ■ • &m Z p / ai +1 > • • • > + 1,1 

bi...b n \ bi + 1 , . . . ,b n + 1 ,2 



5.27 This is the sum of the even terms of F(2ai , . . . , 2a m ; 2bi , . . . , 2b m ; z). 
We have (2a) 2 k+ 2 /(2a) 2k = 4(k + a)(k + a + j), etc. 

5.28 F( a ; b |z) = (1 - z)-“ F( Q ' c c ~ b | = (1 - z)~* F( c ~ b ' Q | f^) = 

(1 — z) c - a - b F( c - a ^ c - b | z). (Euler proved the identity by showing that both 
sides satisfy the same differential equation. The reflection law is often at- 
tributed to Euler, but it does not seem to appear in his published papers.) 

5.29 The coefficients of z n are equal, by Vandermonde’s convolution. (Kum- 
mer’s original proof was different: He considered lirnm^oo F(m, b — a; b; z/m) 
in the reflection law ( 5 . 101 ).) 

5.30 Differentiate again to get z(l — z)F"(z) + (2 — 3z)F'(z) — F(z) = 0. 
Therefore F(z) =F(l,l;2;z) by ( 5 . 108 ). 

5.31 The condition f(k) = T(k + 1) — T(k) implies that f(k + 1)/f(k) = 
(T(k + 2)/T(k+l) — l)/(l — T ( k) /T ( k + 1 ) ) is a rational function of k. 

5.32 When summing a polynomial in k, Gosper’s method reduces to the 
“method of undetermined coefficients.” We have q(k) = r(k) = 1, and we 
try to solve p(k) = s(k + 1) — s(k). The method suggests letting s(k) be a 
polynomial whose degree is d = deg(p) + 1 . 

5.33 The solution to k = (k — 1 )s(k + 1 ) — (k + l)s(k) is s(k) = — k + j, 
hence the answer is (1 — 2k)/2k(k — 1 ) + C. 

5.34 The limiting relation holds because all terms for k > c vanish, and 

e — c cancels with — c in the limit of the other terms. Therefore the second 
partial sum is lim e ^o — n ; e — m; 1 ) = lim e ^o(e+ n ~ m) m /(e — m) m = 

M) m ( n m 1 )- 

5.35 (a) 2- n 3 n [n^0]. (b) (1 - [k^O] = 2 k+1 [k^O]. 
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5.36 The sum of the digits of m + n is the sum of the digits of m plus the 
sum of the digits of n, minus p — 1 times the number of carries, because each 
carry decreases the digit sum by p — 1 . [See [ 226 ] for extensions of this result 
to generalized binomial coefficients.] 

5.37 Dividing the first identity by n! yields ( x ^ y ) = ]T k (k)( n -k)> Van- 
dermonde’s convolution. The second identity follows, for example, from the 
formula x k = (—1 ) k (— x)— if we negate both x and y. 

5.38 Choose c as large as possible such that (]j) + n. Then 0 ^ n — (3) < 
( c ^ 1 ) — (]]) = (2); replace nby n- (]j) and continue in the same fashion. 
Conversely, any such representation is obtained in this way. (We can do the 
same thing with 

n = (“') + (“z) + ' + (m)’ 


for any fixed m.) 


5.39 x m y’ 


= ]T™ =1 ( m+ ^;- k )a-b— k x k + Lk =1 ( m+ - a n ^ k b mi ,k 


m— 1 


for all mn > 0, by induction on m + n. 

5.40 Mr*' n=,£i", ( pctt 1 ) = 

= (-nrr- 1 ) -r = ( r 


( 


‘) -(A)- 


5 - 41 Lk?o n! /(i-k)!(n + k+l)! = (n!/(2n+ 1)!) £ k>n ( k + ), which is 

2 2n n!/(2n+ i|!. 

5.42 We treat n as an indeterminate real variable. Gosper’s method with 
q(k) = k + 1 and r(k) = k — 1 — n has the solution s(k) = l/(n + 2); hence 
the desired indefinite sum is (— l) x_1 ]^qr7/( n x ] )- And 


IH) 


k=0 


= (-1 


\x-1 


n + 1 / (n+V\ 
n + 2 y 


n+1 


1 n + 1 r i 

= 2 [n even] . 

n + 2 


This exercise, incidentally, implies the formula 

1 1 1 

= + 


n 


n — 1 
k 


fn + V 


n 

k+1 


(n+1) 


a “dual” to the basic recurrence (5.8). 

5.43 After the hinted first step we can apply (5.21) and sum on k. Then 
(5.21) applies again and Vandermonde’s convolution finishes the job. (A com- 
binatorial proof of this identity has been given by Andrews [ 10 ]. There’s a 
quick way to go from this identity to a proof of (5.29), explained in [ 207 , 
exercise 1 . 2 . 6 - 62 ].) 
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The boxed 
sentence 
on the 
other side 
of this page 
is true. 


5.44 Cancellation of factorials shows that 


m\ /n\ /m + n\ /m + n — j — k\ /j + k\ /m + n\ 

AUA A = l m-j A j Ai+A’ 


so the second sum is 1 / ( m r ( l n ) times the first. And the first is just the special 
case 1 = 0, n = b, r = a, s = m + n — bof (5.32), so it is ( a ^ b ) ( m+ n-a~ b )- 

5.45 According to (5.9), ^ k<n = ( n+ n^ 2 )- ^ this f° rm °f the 

answer isn’t “closed” enough, we can apply (5.35) and get (2n+ 1)( 2 T [ l )4~ n . 

5.46 By (5.69), this convolution is the negative of the coefficient of z 2n 

in *B_i (z)*B_i (— z). Now (2*B_i (z) — 1 )(2fB_i (— z) — 1) = V 1 — 1 6z 2 ; hence 
*B_i (z)®_i (— z) = ^ v 7 1 — 1 6z 2 + (z) + 2®-i (— ■ z) — By the binomial 

theorem, 


(1—1 6z 2 ) 1 /2 = y f 1 A(-16) n z 2n 
2 — In 

n ' / 



4 n Z 2n 
2n — 1 


so the answer is ( 2 T J l )4 n 1 /(2n— 1) + ( 4 2n 1 )/^ n — 1)- 


5.47 It’s the coefficient of z n in ( < B r (z) s /Q r (z)) (*B t (z) S /Q r (z)) = Q r (z) 2 , 
where Q r (z) = 1 — r + r‘B r (z) _1 , by (5.61). 

5.48 F(2n + 2, 1 ; n + 2; 1) = 2 2n+1 / ( 2 j£( t 1 1 ), a special case of (5.111). 

5.49 Saalschutz’s identity (5.97) yields 

/x + n\ y F / -x, -n, -n— y A = (y -x) n 

V ti / y + n V-x-n, 1 -n-y ) (y + l) 75 " 

5.50 The left-hand side is 


L 

k^O 


Q k b k (_ z )k 

c k k! 


L 

m^O 


a + m — 
m 


z 


m 


n^O k^O 


a k b k 

c k k! 


-l) k 


/n + a— 1\ 

V n-k ) 


and the coefficient of z n is 


fn+ a— 1\ /a,b,— ■ n A a n __ (c — b) n 
V n / \ c, a ) n! c™ 

by Vandermonde’s convolution (5.92). 

5.51 (a) Reflection gives F(a,— n;2a;2) = (— 1 ) n F(a, — n; 2a; 2). (Inciden- 

tally, this formula implies the remarkable identity A 2m+1 f (0) = 0, when 
f(n) = 2 n x— /(2x)— .) 
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(b) The term-by-term limit is ]T 0 ,, k ., m ( k ) 2 m+l-k (~ 2 ) k P lus an ad ' 
ditional term for k = 2m — 1 . The additional term is 


(— m) . . . (-1 ) ( 1 ) . . . (m) (-2m + 1 ) . . . (-1 ) 2 2m+1 
(—2m) . . . (—1 ) (2m — 1 )! 

+ 1 m!m!2^+ ] -2 

( 2 m)! (-V 2 )’ 

hence, by ( 5 . 104 ), this limit is —1 / ( ^ 2 ), the negative of what we had. 

5.52 The terms of both series are zero for k > N. This identity corresponds 
to replacing k by N — k. Notice that 


q n = Q N-k (a + N _ k )k 

= (a + N — 1 )— = a'^fl -a-N)*(-1) k . 


5.53 When b = — j, the left side of ( 5 . 110 ) is 1 — 2z and the right side is 
(1 — 4Z + 4Z 2 ) 1 / 2 , independent of a. The right side is the formal power series 

1 + (^4z(z— 1 )+ ^/ 2 jl 6 z 2 (z— 1) 2 + -.- , 

which can be expanded and rearranged to give 1 — 2 z+ 0 z 2 + 0 z 3 + ■ ■ • ; but the 
rearrangement involves divergent series in its intermediate steps when z = 1 , 
so it is not legitimate. 


The boxed 
sentence 
on the 
other side 
of this page 
is false. 


5.54 If m + n is odd, say 2N — 1 , we want to show that 


lim F 

€ — »0 


/ N — m— j 1 — N + e 

\ — m+e 


= 0 . 


Equation ( 5 . 92 ) applies, since — m + e > — m — j + e, and the denominator 
factor F(c — b) = F(N — m) is infinite since N ^ m; the other factors are finite. 
Otherwise m + n is even; setting n = m — 2N we have 


lim F 

e— »0 


/— N, N— m — 2 + e 

V —m+e 


(N - 1/2)— 


by (5-93)- The remaining job is to show that 

( m \ (N- 1/2)1 (m-N)! = /m-N\ 2N 
\m — 2N/ (-1/2)! m! \m-mj 

and this is the case x = N of exercise 22. 
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5.55 Let Q(k) — (k + Ai)...(kTAM)Z and R(k) — (k —1— B^) ... (k — |— B ^ ) . 
Then t(k+ 1)/t(k) = P(k) Q(k - 1 )/P(k - 1 )R(k), where P(k) = Q(k) - R(k) 
is a nonzero polynomial. 

5.56 The solution to— (k+l)(k+2) = s(k+l ) + s(k) is s(k) — —jk. 2 —k— 
hence £ (~ k 3 ) 6k = 1 (-1 ) k “ 1 (2k 2 + 4k + 1 ) + C. Also 


(-1 


,k-i 


k +1 

k + 2 

2 

2 


(-1 


\k— 1 


4 

8 


k + 1 - 




k + 2 - 


l-(-1] 


(2k 2 + 4k + 1 ) + 


5.57 We have t(k+1 )/t(k) = (k— n)(k+1 +0)(— z)/(k+1 )(k + 0). Therefore 
we let p(k) = k + 0, q(k) = (k — n)(— z), r(k) = k. The secret function s(k) 
must be a constant oco, and we have 

k + 0 = (— z(k - n.) — k) a 0 ; 

hence ocq = — 1/(1 + z) and 0 = — nz/(1 + z). The sum is 


z 


nz 


5k = - ■ 


n 


1 + z V k — 1 


n - 1 


C. 


(The special case z = 1 was mentioned in ( 5 . 18 ).) 


k I k— 1 


m— 1 ,n— 1 


( n ). The summation factor ( 

m V m / V 


5.58 If m > 0 we can replace ( m ) by ^ ( m _ ly 
T — — T 

— m 11 

appropriate: 

Trn .n. 


and derive the formula 
1 is therefore 

m/ 


C) 

We can unfold this to get 


Tm-1,n-1 1 ^ 


m n 


1 m,n 


= I 


0,n— m 


- H m + H n - H r 


Finally To, n -m = H n _ m , so T m>n = (^)(H n - H m ). (It’s also possible to 
derive this result by using generating functions; see Example 2 in Section 7.5.) 

5 - 5 59 Lj^o.k^i (7)D = L lo Sm k J] = Lj^o.k^i (7)^ ^k<m i+1 ), which is 

(7)(m j+1 —m)) = (m 1 • Hi -0 (7) m ’ = (m- 1)(m+ l) n . 
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5.60 ( 2 ^) « is the case m = nof 



5.61 Let |n/pj = q and n. mod p = r. The polynomial identity (x + 1 ) p = 
x p + 1 (mod p) implies that 

(x+1) pq+r = (x+1) r (x p + 1) q (mod p) . 


The coefficient of x m on the left is (™). On the right it’s ( m Z p | C )(^)i 
which is just ( m m r od p ) ( Lm q /pJ ) because 0 ^ r < p. 

5 - 62 (mp) = Lk,+...+k„=mp(k 1 )---(k P J = (rn) ( mod P 2 )- because all 
terms of the sum are multiples of p 2 except for the (™) terms in which 
exactly m of the k’s are equal to p. (Stanley [335, exercise 1.6(d)] shows that 
the congruence actually holds modulo p 3 when p > 3.) 

5.63 This is S n = Lk=o(-4) k (n-k) = The de ' 

nominator of (5.74) is zero when z = —1/4, so we can’t simply plug into 
that formula. The recurrence S n = — 2S n _i — S n _2 leads to the solution 

S n = (-l) n (2rv+1). 

5.64 ^k^o ( (2k) + Gk+i)) ^ ~ ILk^o Gk+i )/(k + 1 ), which is 



2 * 1+2 2 

u + 2 


5.65 Multiply both sides by n n 1 and replace k by n — 1 — k to get 


( 1 \ 1 

n k Jn k (n-k)! = (n - 1 )! ^ (n k+1 /k! - n k /(k - 1 )!) 

2 k=0 

= (n — 1 )! n n /(n — 1 )! . 

(The partial sums can, in fact, be found by Gosper’s algorithm.) Alternatively, 
( k )kn n_1 ~ k k! can be interpreted as the number of mappings of {1 , . . . , n} into 
itself with f(l), . . . , f (k) distinct but f(k+l) £ {f ( 1 ) , . . . , f (k)}; summing on k 
must give n n . 

5.66 This is a walk-the-garden-path problem where there’s only one “ob- 
vious” way to proceed at every step. First replace k — j by l, then replace 
iVA by k, getting 
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The boxed 
sentence 
on the 
other side 
of this page 
is not a 
sentence. 


The infinite series converges because the terms for fixed j are dominated by 
a polynomial in j divided by 2) . Now sum over k, getting 


z 



) + 1 

25 


Absorb the j + 1 and apply (5.57) to get the answer, 4(m + 1 ). 

5.67 ^ (5- 2 6)i because 



5.68 Using the fact that 



1 

2 



[n is even] , 


we get n(2 n 1 - (£/2j))- 

5.69 Since ( k z ') + ( l z 1 ) ^ (2) + (2) k < 1, the minimum occurs 

when the k’s are as equal as possible. Hence, by the equipartition formula of 
Chapter 3, the minimum is 



A similar result holds for any lower index in place of 2. 

5.70 This is F(— n, j] 1; 2); but it’s also (— 2)~ n ( 2 T J x )F(— n, — n; \ — n; \ ) if we 
replace kby u— k. Now F(— n, — n; \ — u; j ) = F( — j, — j; n; 1 ) by Gauss’s 
identity (5.111). (Alternatively, F(— n, — n; j — n; j) = 2~ n F(— n, \\ \ — u; —1 ) 
by the reflection law (5.101), and Kummer’s formula (5.94) relates this to 
(5.55).) The answer is 0 when n is odd, 2^ n ( r ^, 2 ) when n is even. (See [164, 
§1.2] for another derivation. This sum arises in the study of a simple search 
algorithm [195].) 

5.71 (a) Observe that 

_m+k _m 

S(Z) = L a fc (1 ^ z) m+2k +1 = (1 z jm+l A ( z /^ 1 z ) 2 ) 


(b) Here A(z) = ( 2 ^) (“ z ) k /(k + 1) = (V 1 + 4z — l)/2z, so we have 

A(z/(1 - z) 2 ) = 1 - zT Thus Sn = [z n ] (z/(l - z)) m = 
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5.72 The stated quantity is m(m - n) . . . (m - (k - 1 )n)n k ^ (k) /k!. Any 
prime divisor p of n divides the numerator at least k — "v(k) times and di- 
vides the denominator at most k — v(k) times, since this is the number of 
times 2 divides k!. A prime p that does not divide n must divide the prod- 
uct m(m — n) . . . (m — (k — 1 )n) at least as often as it divides k!, because 
m.(m — n) . . . (m — (p r — 1 )n) is a multiple of p r for all r ^ 1 and all m. 

5.73 Plugging in X n = n! yields a = (3 = 1; plugging in X n = nj yields 
a = 1, (3=0. Therefore the general solution is X n = anj + |3(n! — nj). 

5 - 74 

5.75 The recurrence Sic(n + 1 ) = Sk(n) + S(k_i j mod 3 (n) makes it possible 
to verify inductively that two of the S’s are equal and that S(_ n ) mo a 3 (n) 
differs from them by (—1 ) n . These three values split their sum So(n)+S] (n) + 
S 2 (n) = 2 n as equally as possible, so there must be 2 n mod 3 occurrences of 
[2 n /3] and 3 — (2 n mod 3) occurrences of [2 rL /3J . 

5.76 Q n ,,c = (n+ !)(£)- (,£,). 


5.77 The terms are zero unless ki ^ ^ k m , when the product is the 

multinomial coefficient 

( km ) . 

\ki , k2 ki , . . . , k m k m _i ) 

Therefore the sum over ki , ..., k m _i is m km , and the final sum over k m 
yields (m n+1 - l)/(m- 1). 


The boxed 
sentence 
on the 
other side 
of this page 
is not boxed. 


5.78 Extend the sum to k = 2m 2 + m — 1; the new terms are (|) + ( 2 ) + 
• • • + ( T ^ m 1 ) = 0. Since m _L (2m + 1 ), the pairs (k mod m, k mod (2m + 1 )) 
are distinct. Furthermore, the numbers (2) + 1 ) mod (2m + 1 ) as j varies from 
0 to 2m are the numbers 0, 1 , . . . , 2m in some order. Hence the sum is 


L (■) = X 2k = 2 ™-'' 

O^kcm ^ ' 0^k<m 

0^j<2m+1 


5.79 (a) The sum is 2 2n 1 , so the gcd must be a power of 2. If n = 2 k q 
where q is odd, ( 2n ) is divisible by 2 k+1 and not by 2 k+2 . Each ( 2 f+i) is 
divisible by 2 k+1 (see exercise 36), so this must be the gcd. (b) If p r ^ n+1 < 
p r+1 , we get the most radix p carries by adding k to n — k when k = p r — 1 . 
The number of carries in this case is r — e p (n + 1 ), and r = e p (L(n + 1 )). 

5.80 First prove by induction that k! (k/e) k . 




A ANSWERS TO EXERCISES 537 


5.81 Let fi,m,n( x ) be the left-hand side. It is sufficient to show that we have 

f i,m,n ( 1 ) > 0 and that f{ mn (x) < 0 for 0 ^ x ^ 1 . The value of f ( 1 ) 
is (— 1 ( l4 L ^ 0 ) by (5.23), and this is positive because the binomial 

coefficient has exactly n — m — 1 negative factors. The inequality is true when 
1 = 0, for the same reason. If l > 0, we have f{ n (x) = — lfi-i,m,n+i (x), 
which is negative by induction. 

5.82 Let e p (a) be the exponent by which the prime p divides a, and let 
m = n — k. The identity to be proved reduces to 

min(e p (m)-e p (m+k), e p (m+k+1 )— e p (k+1 ), e p (k)— e p (m+1 )) 

= min(e p (k)-e p (m+k),e p (m)-e p (k+l),e p (m+k+l)-e p (m+l)) . 

For brevity let’s write this as min(xi ,yi ,z-| ) = min(x2,y2, x 2)- Notice that 
Xi + y i + zi = X2 + y2 + Z2- The general relation 


e p (a) < e p (b) 


e P (a) 


e p (|a±b|) 


allows us to conclude that xi 7^ X2 min(xi ,X2) =0; the same holds also 
for (yi ,y2) and (zi ,Z2). It’s now a simple matter to complete the proof. 

5.83 (Solution by P. Paule.) Let r be a nonnegative integer. The given sum 
is the coefficient of x l y m in 


- 

j,k 


= 1 - 


c )j+ k /-, 


Vi 


(1 +y)«+n-J-kyj 


(1 +x)y 


1 +x 
n +y)x 


= (-1T(1 -xyr +r (l +y) s - r /x 


1 


-y 


(l +y) s+n 


so it is clearly (— 1 i l (n+i) (m-n-i)' (S ee a ^ so exerc i se 106 .) 

5.84 Following the hint, we get 


z‘B t (z)’- 1 03 ((z) = £ 


k>0 


tk + r 
k 


kz k 

tk + r ’ 


and a similar formula for £ t (z). Thus the formulas (z+B^T 1 (z)‘B((z) + l)‘B t (z) r 
and (ztfi^T 1 (z)£((z) + 1)£ t (z) r give the respective right-hand sides of (5.61). 
We must therefore prove that 


(zt®^ 1 (z)®((z) + l)‘B t (z) r 
(ztE^T 1 (z)£((z) + l)£ t (z) r 


1 

1 - t + tB^z)- 1 ’ 

1 


and these follow from (5.59). 


1 -zt£(z)t ’ 
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5.85 If f (x) = a n x n + • • ■ + ai x + Qo is any polynomial of degree ^ n, we 
can prove inductively that 

Y (-1) e,+ ' +e,l f(eix 1 +• " + e n x n ) = (— 1 ) n n! a n xi . . . x n . 

O^ei 


The stated identity is the special case where a n = 1/n! and Xk = k 3 . 

5.86 (a) First expand with n(n— 1 ) index variables Ip for all i ^ j. Setting 

kp = Ip — Ip for 1 i < j < n and using the constraints (Uj — Iji) = 0 
for all i < n allows us to carry out the sums on lj n for 1 ^ j < n and then 
on Iji for 1 ^ i < j < n by Vandermonde’s convolution, (b) f(z) — 1 is a 
polynomial of degree < n that has u roots, so it must be zero, (c) Consider 
the constant terms in 


II 

"i^r 



l n 


k=1 

Mi 



5.87 The first term is ]T k ( n k k )z mk , by (5.61). The summands in the sec- 
ond term are 


~Y ( ^ + + (£ z ) k+n+1 

m — \ V / 


m 


k>0 




k >n 


(1+1/m)k — n — 1 
k-n- 1 


(Cz) k . 


Since /Lo<i<m(£ 2i+1 ) k = m(— 1 ) l [k = ml], these terms sum to 


L 

k>n/m 


f (1+1/m)mk — u — 1^ j_ z mA 
V mk — n — 1 


= L 

k>n/m 


/(m-H)k-n- l^ ( _ zmik 


= L 

k>n/m 


u — mk 
k 


.mk 


Incidentally, the functions !B m (z m ) and C 2 ’ +1 / m (C 2 ^ +1 z) 1 /™ are the 

m + 1 complex roots of the equation w m+1 — w m = z m . 

5.88 Use the facts that — e~ nt ) dt/t = Inn and (1 — e _t )/t 1. 

(We have ( k ) = 0(k _x_1 ) as k — > 00, by (5.83); so this bound implies that 
Stirling’s series ]T k Sk( k ) converges when x > — 1. Hermite [186] showed that 
the sum is lnP(l +x).) 
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5.89 Adding this to (5.19) gives y T (x + y) m+r on both sides, by the bino- 
mial theorem. Differentiation gives 



and we can replace k by k + m + 1 and apply (5.15) to get 



In hypergeometric form, this reduces to 


( 1 — r, n+1 -x\ 

v m + 2 y ) 



f m+1 +r, n+1 
V m+2 



which is the special case (a, b, c, z) = (n + 1 , m + 1 + r, m + 2, — x/y ) of the 
reflection law (5.101). (Thus (5.105) is related to reflection and to the formula 
in exercise 52 .) 

5.90 If r is a nonnegative integer, the sum is finite, and the derivation in 
the text is valid as long as none of the terms of the sum for 0 ^ k ^ r has 
zero in the denominator. Otherwise the sum is infinite, and the kth term 
( k- k _1 )/( k- k -1 ) is approximately k s ~ r (-s — 1 )!/(— r - 1 )! by (5.83). So we 
need r > s + 1 if the infinite series is going to converge. (If r and s are complex, 
the condition is SHr > 9 ts + 1 , because |k z | = k'^ z .) The sum is 



r(r-s-1)r(-s) _ s + 1 
F(r — s)P( — s — 1) s + 1— r 


by (5.92); this is the same formula we found when r and s were integers. 

5.91 (It’s best to have computer help for this.) Incidentally, when c = 
(a + l)/ 2 , this reduces to an identity that’s equivalent to Gauss’s identity 
(5.110), in view of Pfaff’s reflection law. For if w = — z/(l — z) we have 

4 w( 1 — w) = — 4 z/(l — z) 2 , and 


2 a, 2 a+ 2 — b 
1+a-b 


4 w( 1 — w) = F 


a, a+1 —2b 


1+a-b 
= (1-z) Q F 


— z 
1 — z 


a, b 

1+a-b 
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5.92 The identities can be proved, as Clausen proved them more than 150 
years ago, by showing that both sides satisfy the same differential equation. 
One way to write the resulting equations between coefficients of z n is in terms 
of binomial coefficients: 


z 


n-k \n-k 


2r\ (r + s\ (2s 


n 


n 


n 


r + s — 1 /2\ /r + s — 1 /2 


n-k 


2 r + 2s\ (r + s — 1 /2\ ’ 


n 


n 


Z 


-1/4 + r\ /— 1/4 + s\ /— 1/4 — r\ /-1/4-s 


n-k 


— 1 + r + s\ /—l — r — s 
n-k 


n — k 


— 1/2\ /— 1/2 + r — s\ /-1/2-r + s 


n 


n 


n 


— 1 + r + s\ /—l — r — s 


n 


n 


Another way is in terms of hypergeometrics: 


a,b, j — a — b — n,— n 
2 + a+b, 1 — a— n, 1 — b— n 

5 + a, 5 +b, a + b-n, -n 
1 +a + b, l + a — n, |+b— n 


1 = 


(2a) n (a + b) n (2b) n _ 
(2a + 2b) w cCb^ ’ 


(l/2) n (1/2+a — b) n (4/2 — a + b) T 
(I + a + b) 7r (l/4 — a) w (1/4 — b)™ 


The boxed 
sentence 
on the 
other side 
of this page 
is not seif- 
referential. 


5.93 a 1 n]Li ( f (J) + a)/f(j). 

5.94 Gosper’s algorithm finds the answer — (/ J j ( a k ' ) a/n + C. Conse- 
quently, when m 0 is an integer, we have 




+ C. 


5.95 The leading coefficients of p and r should be unity, and p should have 
no factors in common with q or r. It is easy to fulfill these additional condi- 
tions by shuffling factors around. 

Now suppose p(k+ 1 )q(k)/p(k)r(k+ 1) = P(k+ 1 )Q(k)/P(k)R(k+ 1), 
where the polynomials (p, q,r) and (P, Q, R) both satisfy the new criteria. Let 
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Burma- 

Shave 


Po(lc) = p(k)/g(k) and P 0 (k) = P(k)/g(k), where g(k) = gcd(p(k), P(k)) is 
the product of all common factors of p and P. Then 

Po(k + 1)q(k)P 0 (k)R(k + 1) = p 0 (k)r(k + 1 )P 0 (k + 1 ) Q(k) . 

Suppose po(k) 7 b 1. Then there is a complex number oc such that po(a) = 0; 
this implies q(a) 7 ^ 0, r(a) 7 ^ 0, and Po(cx) 7 b 0 . Hence we must have 
Po(a+l)R(cx+l) = 0 and po(ct — 1 ) Q(a — 1 ) =0. Let N be a positive integer 
such that po(ot+N) 7 ^ 0 and po(tx— N) 7 b 0. Repeating the argument N times, 
we find R(a+1 ) . . . R(ot+N) = 0 = Q(cx— 1 ) . . . Q(oc— N), contradicting ( 5 . 118 ). 
Therefore po(k) = 1. Similarly Po(k) = 1, so p(k) = P(k). Now q(a) = 0 
implies r(oc + 1) 7 ^ 0, by ( 5 . 118 ), hence q(k)\Q(k). Similarly Q(k)\q(k), 
so q(k) = Q(k) since they have the same leading coefficient. That leaves 
r(k) =R(k). 

5.96 If r(k) is a nonzero rational function and T(k) is a hypergeometric term, 
then r(k)T(k) is a hypergeometric term, which is called similar to T(k). (We 
allow r(k) to be 00 and T(k) to be 0, or vice versa, for finitely many values 
of k.) In particular, T(k+ 1) is always similar to T(k). If Ti (k) and T 2 (k) are 
similar hypergeometric terms, then T] (k) + T 2 (k) is a hypergeometric term. 
If Ti (k), . . . , T m (k) are mutually dissimilar, and m > 1 , then T| (k) + • • • + 
T m (k) cannot be zero for all but finitely many k. For if it could, consider 
a counterexample for which m is minimum, and let Tj(k) = Tj(k+ 1)/Tj(k). 

Since Ti (k) H b T m (k) = 0, we have r m (k)Ti (k) 4 b r m (k)T m (k) = 0 

and r 1 (k)Ti (k) 4 br m (k)T m (k) = (k+ 1) 4 b T m (k+ 1) = 0; hence 

( r m(k) — ri (k))T! (k) + • • • + (r m (k) -r m _-| (k))T m _i (k) = 0. We cannot have 
r m (k) — Tj (k) = 0, for any ) < m, since Tj and T m are dissimilar. But m was 
minimum, so this cannot be a counterexample; it follows that m = 2. But 
then Ti (k) and T 2 (k) must be similar, since they are both zero for all but 
finitely many k. 

Now let t(k) be any hypergeometric term with t(k + l)/t(k) = r(k), 

and suppose that t(k) = (Ti (k4- 1) 4 bT m (k4- 1)) — (Ti (k) 4 fT m (k)), 

where m is minimal. Then Tj , . . . , T m must be mutually dissimilar. Let Tj (k) 
be the rational function such that 

T(k) (Tj (k 4- 1 ) - T, (k)) — (T, (k + 2) — T, (k + 1 )) = r,(k)T,(k). 

Suppose m > 1. Since 0 = r(k)t(k)— t(k4-l ) = ri (k)Ti (k)4-- • -4-T m (k)T m (k), 
we must have Tj(k) = 0 for all but at most one value of j. If rj(k) = 0, the 
function t(k) = Tj (k 4- 1 ) — Tj (k) satisfies t(k 4- 1 )/t(k) = t(k 4- 1 )/t(k). So 
Gosper’s algorithm will find a solution. 

5.97 Suppose first that z is not equal to — d — 1/d for any integer d > 0. 
Then in Gosper’s algorithm we have p ( k) = 1, q(k) = (k 4- I) 2 , r(k) = 
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k 2 + kz + 1. Since deg(Q) < deg(R) and deg(p) — deg(R) + 1 = — 1, the 
only possibility is z = d + 2 where d is a nonnegative integer. Trying s(k) = 
a^k 11 + • ■ • + <xo fails when d = 0 but succeeds whenever d > 0. (The linear 
equations obtained by equating coefficients of k d , k d_1 , . . . , k 1 in ( 5 . 122 ) 
express ctd_i , . . . , ao as positive multiples of ocd, and the remaining equation 
1 = ad + • • ■ + ai then defines cxd-) For example, when z = 3 the indefinite 
sum is (k + 2)k! 2 / n Jr , 1 (j 2 + 3) + 1 ) + C. 

If z = — d — 1/d, on the other hand, the stated terms t(k) are infinite 
for k ^ d. There are two reasonable ways to proceed: We can cancel the zero 
in the denominator by redefining 

= k ! 2 = (d 1 /d)!k ! 2 

n!U + ,0 2 -j(d + l/d) + l) (k — 1 /d)! (k — d)! ’ 


thereby making t(k) =0 for 0 ^ k < d and positive for k ^ d. Then Gosper’s 
algorithm gives p(k) = k— , q(k) = k + 1 , r(k) = k — 1/d, and we can solve 
( 5 . 122 ) for s(k) because the coefficient of k’ on the right is (j + 1 + 1 /d )ck) 
plus multiples of {oq + i , . . . , otd}- For example, when d = 2 the indefinite sum 
is (3/2)! k! (§ k 2 - ff k + ^)/(k - 3/2)! + C. 

Alternatively, we can try to sum the original terms, but only in the 
range 0 ^ k < d. Then we can replace p(k) = k— by 


fd-Ji 


p'(k) = 2J-1 j“ + 

j=i 


k>- 


Look, any finite 
sequence is triv- 
ially summable, 
because we can find 
a polynomial that 
matches t(k) for 
OsC k< d. 


This is justified since ( 5 . 117 ) still holds for 0 ^ k < d — l;we have p'(k) = 
lim e ^o ((k+ e)— — k— ) /e = lim e ^o(l c + e)— / e, so this trick essentially cancels 
a 0 from the numerator and denominator of ( 5 . 117 ) as in L’Hospital’s rule. 
Gosper’s method now yields an indefinite sum. 


5.98 nSn+i = 2nS n . (Beware: This gives no information about Si /So-) 

5.99 Let p(n,k) = (n+1 +k)(3 0 (n) + (n+ 1 + a + b + c + k)(3i (n) = p(n,k), 
t(n,k) = t(n, k)/(n + 1 + k), q(n,k) = (n +1 + a + b + c + k)(a — k)(b - k), 
r(n, k) = (n+1 + k) (c + k)k. Then ( 5 . 129 ) is solved by |3o(n) = (n + 1 + a + 
b + c)(n+ 1 + a + b), (3 1 (n) = — (n + 1 + a)(n+ 1 +b), a 0 (n) = s(n,k) = — 1 . 

We discover ( 5 . 134 ) by observing that it is true when n = —a and using 
induction on n. 


5.100 The Gosper-Zeilberger algorithm discovers easily that 
n + 2 2 n + 2 n — k n +1 — k 



0 ^ k < n. 
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Summing from k = 0 to n— 1 yields (n+2) (S n — 1 ) — (2n+2)(S n+ i — 1 — i ^ r y ) = 
— n. Hence (2n + 2)S n+ i = (n + 2)S n +2n + 2. Applying a summation factor 
now leads to the expression S n = (n + 1 )2 ~ n ~ 1 ^ k +, 2 k /k. 

5.101 (a) If we hold m fixed, the Gosper-Zeilberger algorithm discovers that 

(n + 2 )S m ,Tt+ 2 (z) = (z — 1 )(n+ 1 )S m , n (z) + (2n + 3 — z(n — m + 1 ))S m , n +i (z). 

We can also apply the method to the term 

|3o(m,n.)t(m,n,k) + (3 1 (m, n)t(m+1,n, k) + (3 ^ (ttl, n)t(m, n+1,k) , 


in which case we get a simpler recurrence, 


(m+ l)S m+ i, n (z) - (n+ l)S m , n+ i(z) = (1 - z)(m-n)S m , n (z) . 


(b) Now we must work a little harder, with five equations in six unknowns. 
The algorithm finds 


(n + 1 )(z — 1 ) 2 

+ (n + 2 ) 


z k — (2n + 3)(z+ 1 ' 

n + 2 N 2 
k 


n+1 
k 

= T(n,k+1)-T(n,k), 


T(n,k) 

s(n,k) 


/n+ 1 \ 2 s(njc] k 
\k — 1 J n +1 Z ’ 

(z— l)k 2 — 2 ((u+ 2 )z— 2 n— 3)k+ (n+ 2 )((ri+ 2 )z— 4n— 5) . 


Therefore (n+ 1 )(z— 1 ) 2 S n (z) — (2n + 3)(z+ 1 )S n+ i (z) + (n + 2 )S n+ 2 (z) = 0. 

Incidentally, this recurrence holds also for negative n, and we have S_ n _i (z) = 
Sn(z)/(1 — z) 2n+1 . 

The sum S n (z) can be regarded as a modified form of the Legendre 
polynomial P n (z) = ( k )“(z~l ) n_k (z+ 1 ) k /2 rl , since we can write S n (z) = 

(1 -z) n P n ({^§). Similarly, S m>n (z) = (1 - z) n P^ 0,m_n) (|^|) is a modified 
Jacobi polynomial. 

5.102 The sum is F(a— |n, — n; b — |n; — z), so we need not consider the case 
How about z = 0? z = — 1. Let n = 3m. We seek solutions to ( 5 . 129 ) when 


p(m, k) = (3m + 3 — k)-(m + 1 — k)|3o + (4m + 4 — b — k)-(3i , 
q(m,k) = (3m + 3 — k) (m + 1 — a — k)z, 
r(m, k) = k(4m+l— b — k), 
s(m,k) = a 2 k 2 + aik+ ao . 


The resulting five homogeneous equations have a nonzero solution (ao, ocj , 0 C 2 , 
|3o, |3i ) if and only if the determinant of coefficients is zero; and this deter- 
minant, a polynomial in m, vanishes only in eight cases. One of those cases 
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is, of course, (5.113); but we can now evaluate the sum for all nonnegative 
integers n, not just n ^ 2 (mod 3): 



Here the notation [co,Ci,C2] stands for the single value c n mo d 3. Another 
case, (a, b,z) = (2,0,8), yields the identity 



(This sum, amazingly, is zero unless n is a multiple of 3; and then the identity 
can be written 


which might even be useful.) The remaining six cases generate even weirder 
sums 



where the respective values of (a, b, z, Co, Ci , C2, a', b', x) are 


7 1 

12’ 3’ 

8,1, 

- 1 , 0, 

?,0, 

64); 

( l 0, 

8,1,2, 

0, 12 ’ 3 ’ 

64); 

5 2 
12’ 3’ 

8,1, 

0,-3, 

1,0, 

64); 

f_L 1 

1 12 ’ 3 ’ 

8,1,3, 

0, f,0, 

64); 

b 0’ 

-4,1, 

2, 0, 

1 1 

6 > 3 > 

-16); 

( 1 2 

6 ’ 3 ’ 

-4,1,0, 

-3, |,0, 

-16). 


5.103 We assume that each a[ and b{ is nonzero, since the corresponding 
factors would otherwise have no influence on the degrees in k. Let t(n, k) = 
p(n, k)t(n, k) where 


nLi( a ^ + a !k+ a i l [ a i<fl] + a ()! . 
t(n,k) = z. 

n?=i ( b t n + b ( k + b i i [ b t > °] + k ) ! 

Then we have deg(p) = deg(f) + max(£j? =1 bt [bi > 0] — Y i = 1 a i [ Q i < 0] , 
LLi a t [a t >0] - Yi-^ b i [bt < 0]) p deg(f) + ^(lail H + |a p | + |bi| + 
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•'■ + |b q |), except in unusual cases where cancellation occurs in the lead- 
ing coefficient. And deg(q) = ^Z[ =1 a([a(>0] — ^? =1 b{[b{<0], deg(r) = 
L?=i KW > 0] Hi =1 a.{ [a.{ < 0], again except in unusual cases. 

(These estimates can be used to show directly that, as l increases, the 
degree of p eventually becomes large enough to make a polynomial s(n, k) 
possible, and the number of unknown oq and |3j eventually becomes larger 
than the number of homogeneous linear equations to be solved. So we obtain 
another proof that the Gosper-Zeilberger algorithm succeeds, if we argue as 
in the text that there must be a solution with |3o(tl), . . . , (3 r ( tl) not all zero.) 

5.104 Let t(n, k) = (—1 ) k (r— s— k)! (r— 2k) !/((r — s— 2k)! (r— n—k+1 )! (n— k)! 
k!). Then (3o(n.)t(n, k) + |3i (n)t(n+ 1 , k) is not summable in hypergeometric 
terms, because deg(p) = 1, deg(q — r) =3, deg(q + r) = 4, A = —8, A' = —4; 
but |3o(Ti)t(n, k) + |3i (n)t(n + 1 , k) + |32(n.)t(n + 2, k) is — basically because 
A' = 0 when q (n, k) = — (r — s — 2k) (r — s — 2k — 1 ) (n + 2 — k) (r — n — k + 1 ) 
and r(k) = (r — s — k + 1 )(r — 2k + 2)(r — 2k + 1 )k. The solution is 

|3 0 (n) = (s -n)(r-n+ l)(r-2n+ 1) , 

|3i(n) = (rs — s 2 — 2rn + 2n 2 — 2r + 2n)(r — 2n— 1) , 

02 (n) = (s — r + n + 1 )(n + 2)(r - 2 n- 3) , 
ao(n) = r — 2n — 1 , 

and we may conclude that (3o(n.)S n + (3 1 (n)S n +i + |32(n.)S n+ 2 = 0 when S n 
denotes the stated sum. This suffices to prove the identity by induction, after 
verifying the cases n = 0 and n = 1 . 

But S n also satisfies the simpler recurrence j3o(TL)Sn + 3i (n)Sn.+i = 0, 
where j3 0 (n.) = (s — n) (r — 2n + 1 ) and j3 1 (n) = — (n + 1 ) (r — In — 1). 
Why didn’t the method discover this? Well, nobody ever said that such a 
recurrence necessarily forces the terms j3o(n.)t(n, k) + j3i (n)t(n + 1 , k) to be 
indefinitely summable. The surprising thing is that the Gosper-Zeilberger 
method actually does find the simplest recurrence in so many other cases. 
Notice that the second-order recurrence we found can be factored: 

Po(n) + Pi(n)N + p 2 (n)N 2 

= ((r-n+l)N + (r-s-n-l)) (j3 0 (n) + pi (n)N) , 

where N is the shift operator in (5.145). 

5.105 Set a = 1 and compare the coefficients of z 3n on both sides of Henrici’s 
“friendly monster” identity, 
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where f (a, z) = F(1 ; a, 1 ; z). The identity can be proved by showing that both 
sides satisfy the same differential equation. 

Peter Paule has found another interesting way to evaluate the sum: 



using the binomial theorem, Vandermonde’s convolution, and the fact that 
[z°]g(az) = [z°]g(z). We can now set N = 3n and apply the Gosper-Zeilberger 
algorithm to this sum S n , miraculously obtaining the first-order recurrence 
(n + 1 ) 2 S n+ i = 4(4n + 1 )(4n + 3)S n ; the result follows by induction. 

If 3n is replaced by 3n + 1 or 3 n + 2, the stated sum is zero. Indeed, 
Hk+i+m=N t(lc, l, m)a> l ~ m is always zero when N mod 3 ^ 0 and t(k, l, m) = 
t(l, m, k). 

5.106 (Solution by Shalosh B Ekhad.) Let 


T(r,j,k) 

U(r,j,k) 


(( 1 +n.+s)( 1 +r) - ( 1 +n+r)j + (s-r)k)(j-l)j . 

(I — m + n - r + s)(n + r + 1)(j - r — l)(j + k) ’ ’ ’ 

(s + n+ 1 )(k + l)k 

(l - m + n - r + s)(n + r + l)(j + k) 


The stated equality is routinely verifiable, and ( 5 . 32 ) follows by summing 
with respect to j and k. (We sum T(r,j + 1 , k) — T(rJ,k) first with respect 
to j, then with respect to k; we sum the other terms U(r,j,k+ 1) — U(r,j,k) 
first with respect to k, then with respect to j.) 
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Well, we also need to verify (5.32) when r = 0. In that case it reduces 
via trinomial revision to £ k (-1 ) k (n+i) O 0 + m~ k ) = (- 1 ) l (n+i) (m-n-i)' 
We are assuming that l, m, and n are integers and n 0. Both sides are 
clearly zero unless n + l ^ 0. Otherwise we can replace k by n — k and use 
(5.24). 


Noticee that 1 /nk 
is proper, since it’s 
(n-1)!(k— 1)1/ 
n! k! . Also 
1 / (n 2 — k 2 ) is 
proper. But 
1 / (n 2 + k 2 ) isn ’t. 


5.107 If it were proper, there would be a linear difference operator that an- 
nihilates it. In other words, we would have a finite summation identity 

1 T 

LL ai,j(n)/((n + i)(k + j) + l) = 0, 
i 0 j- 0 

where the a’s are polynomials in n, not all zero. Choose integers i, j, and n 
such that n > 1 and aij(n) ^ 0. Then when k = — 1 /(n + i) — j, the (i, j) 
term in the sum is infinite but the other terms are finite. 


5.108 Replace k by m — k in the double sum, then use (5.28) to sum on k, 
getting 


A 


m,n 



m + n — ) 
m 


2 


trinomial revision (5.21) then yields one of the desired formulas. 

It appears to be difficult to find a direct proof that the two symmetrical 
sums for A m n are equal. We can, however, prove the equation indirectly 
with the Gosper-Zeilberger algorithm, by showing that both sums satisfy the 
recurrence 


(t. 4“ 1 ) A m n f (m, Tr)A m n _|-i -1- (n T 2) A m rL ^_2 — 0 > 
where f(m,n) = (2 n + 3)(n 2 + 3n + 2m 2 + 2m + 3). Setting tj (n, k) = 

(do (\ +l )("i k ) = wrTrrO') 2 , 

(n+ 1 ) 2 tj (n, k) — f(m,n)tj(n + 1,k) + (n + 2) 2 tj (n + 2,k) 

= Tj (n, k + 1 ) - Tj (n, k) , 

where Ti (n, k) = — 2(2n + 3)k 4 ti (n, k)/(n +1 — k)(n + 2 — k) and T2(n, k) = 
— ((n + 2)(4mn + n + 3m 2 + 8m + 2) — 2(3mn + n + m 2 + 6m + 2)k + 
(2m + 1 )k 2 )k 2 (m+n+l — k) 2 t2(n, k)/(n+2— k) 2 . This proves the recurrence, 
so we need only verify equality when n = 0 and n = 1 . (We could also have 
used the simpler recurrence 

m- 3 A tn , n _ 1 -n 3 A m _!, n = (m - n)(m 2 + n 2 - mn)A m _, >n _! , 
which can be discovered by the method of exercise 101.) 
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The fact that the first formula for A m n equals the third implies a 
remarkable identity between the generating functions Y_ m n A m n w m z n : 

v- w k S k (z) 2 _ /2k\ 2 w k z k 

Z_ (l _ z )2k+i - 2 - (1 - w )2k+i (1 - z )2k+1 ’ 

lc lc 

2 

where S k (z) = ( k ) zb It turns out, in fact, that 

v— w k S k (x)S k (ij) _ y- /2k\ w k Y-) (j) x ^y 

2- (1 -x) k (1 — ■y) k 2— (1 -w) 2k+1 (1 x) k ( 1 -q) k ’ 

this is a special case of an identity discovered by Bailey [19]. 

5.109 Let X n = ( k ) a ° ( n k k ) ' • • • ( n ^ lk ) l * k for any positive integers 

do, ai , . . . , <q, and any integer x. Then if 0 m < p we have 


X 


m+pn 



/m + pn + 10 + pk) 
V i +P k 


ai 

x i+pk 


X m X 


rt 



And corresponding terms are congruent (mod p), because exercise 36 implies 
that they are multiples of p when Ij + m ^ p, exercise 61 implies that the 
binomials are congruent when Ij + m < p, and (4.48) implies that x p = x. 

5.110 The congruence surely holds if 2n+ 1 is prime. Steven Skiena has also 
found the example n = 2953, when 2n + 1 = 3 • 11 • 1 79. 

5.111 See [96] for partial results. The computer experiments were done by 
V. A. Vyssotsky. 

5.112 If n is not a power of 2, ( 2 r [ l ) is a multiple of 4 because of exercise 36. 
Otherwise the stated phenomenon was verified for n T 2 22000 by A. Granville 
and O. Ramare, who also sharpened a theorem of Sarkozy [317] by showing 
that ( 2 T ] t ) is divisible by the square of a prime for all n > 2 22000 . This 
established a long-standing conjecture that ( 2 r [ l ) is never squarefree when 
n > 4. 

The analogous conjectures for cubes are that ( 2 ^) is divisible by the 
cube of a prime for all n > 1 056, and by either 2 3 or 3 3 for all n > 2 29 + 2 23 . 
This has been verified for all n < 2 10000 . Paul Erdos conjectures that, in 
fact, maxp ep(( 2 ] 1 )) tends to infinity as u -) 00; this might be true even if 
we restrict p to the values 2 and 3. 


Ilan Vardi notes 
that the condi- 
tion holds for 
2 n + 1 = p 2 , 
where p is prime, 
if and only if 
2 P ~' mod p 2 = 1 . 
This yields two 
more examples: 
n = (1 093 2 — 1 )/2; 
n= (351 1 2 — 1 )/2. 
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5.113 The theorem about generating functions in exercise 7.20 may help re- 
solve this conjecture. 

5.114 Strehl [344] has shown that Cn 2 ' = 22 k ( k ) 3 = ^ k ( k ) 2 ( 2 r J c ) is a so- 

called Franel number [132], and that c„ 1 = (£) 2 ( 2 1 ^) 2 ( n 2k k )- In another 

direction, H. S. Wilf has shown that c !] 11 is an integer for all m when n ^ 9. 

6.1 2314, 2431, 3241, 1342, 3124, 4132, 4213, 1423, 2143, 3412, 4321. 

6.2 { k }m-, because every such function partitions its domain into k non- 
empty subsets, and there are m— ways to assign function values for each 
partition. (Summing over k gives a combinatorial proof of ( 6 .io).) 

6.3 Now d k +i ^ (center of gravity) — e = 1 — e + (di + • • • + d k )/k. This 
recurrence is like ( 6 . 55 ) but with 1 — e in place of 1 ; hence the optimum 
solution is d k +i = (1 — e)H k . This is unbounded as long as e < 1. 

6.4 H 2n+1 - jH n . (Similarly (-1 = H 2n - H n .) 

6.5 U n (x,y) is equal to 

OHf-’k-'lx+k^l-'+i)!^, (k ) (— ! ) k_1 (x+ky) n -i , 


and the first sum is 


U 


n— 1 lx,y 


+ L 

k>1 


n - 1 
k- 1 


(~1 ) k_1 k _1 (x + ky ) 


n— 1 


The remaining k. 1 can be absorbed, and we have 


Z 

k>l 


(-1 


vk-1 


x+ky) 


n— 1 


= X 




k >0 


(-1 


,k-l 


x+ky) 


n— 1 


= X 


n-1 


This proves ( 6 . 75 ). Let R n (x,y) = x~ n U n (x,y); then Ro(x,y) = 0 and 
Rn(*,y) = Rn-i ( x .y) + V n +y/ x i hence R n (x,y) = H n +ny/x. (Incidentally, 
the original sum U n = U n (n, — 1) doesn’t lead to a recurrence such as this; 
therefore the more general sum, which detaches x from its dependence on n, 
is easier to solve inductively than its special case. This is another instructive 
example where a strong induction hypothesis makes the difference between 
success and failure.) 


The Fibonacci re- 
currence is additive, 
but the rabbits are 
multiplying. 


6.6 Each pair of babies bb present at the end of a month becomes a pair 
of adults aa at the end of the next month; and each pair aa becomes an 
aa and a bb. Thus each bb behaves like a drone in the bee tree and each aa 
behaves like a queen, except that the bee tree goes backward in time while the 
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rabbits are going forward. There are F n+ i pairs of rabbits after n months; 
F n of them are adults and F n _i are babies. (This is the context in which 
Fibonacci originally introduced his numbers.) 

6.7 (a) Set k = 1 — n and apply (6.107). (b) Set m = 1 and k = n— 1 and 

apply (6.128). 


6.8 55 + 8 + 2 becomes 89 + 1 3 + 3 = 1 05; the true value is 1 04.607361 . 

6.9 21. (We go from F n to F n+ 2 when the units are squared. The true 
answer is about 20.72.) 

6.10 The partial quotients do, ai, <12, . . . are all equal to 1, because 4> = 
1 + 1 /4>. (The Stern-Brocot representation is therefore RLRLRLRLRL . . . .) 

6.11 (— 1) n = [n = 0] — [n = 1]; see (6.11). 

6.12 This is a consequence of (6.31) and its dual in Table 264. 

6.13 The two formulas are equivalent, by exercise 12. We can use induction. 
Or we can observe that z n D n applied to f(z) = z x gives x— z x while -0 n applied 
to the same function gives x n z x ; therefore the sequence (S 0 ,!) 1 , 0 2 , . . . ) must 
relate to (z°D°, z 1 D 1 , z 2 D 2 , . . . ) as (x°, x 1 , x 2 , . . . ) relates to (x-, xl, x-, . . . ). 


That “true value" 
is the length of 
65 international 
miles, but the in- 
ternational mile 
is actually only 
.999998 as big as 
a U. S. statute mile. 
There are exactly 
6336 kilometers in 
3937 U. S. statute 
miles; the Fibonacci 
method converts 
3937 to 6370. 


6.14 We have 


x + k' 

n 


= (k + r 


x + k 
n + 1 


+ (n — k) 


x + k + 1 
n+1 


because (n+ 1 )x = (k+ 1 )(x + k — n) + (n — k)(x + k+ 1 ). (It suffices to verify 
the latter identity when k = 0, k = — 1 , and k = n.) 

6.15 Since A(( x ^ k )) = ( x ^ k ), we have the general formula 



= A m (x n ) = Y_ 
j 



(x + j) n . 


Set x = 0 and appeal to (6.19). 

6.16 = X!j>o a i{ Tl k ) }> this sum is always finite. 

6-17 (a) |-| = mlj. (b) |2| =nH^ = u![n^k]/k!. (c) |£| = k!{-}. 

6.18 This is equivalent to (6.3) or (6.8). 

6.19 Use Table 272. 

6 - 20 Vi 2 = LtW n+1 = ( n + !) H n ’ - H n- 

6.21 The hinted number is a sum of fractions with odd denominators, so 
it has the form a/b with a and b odd. (Incidentally, Bertrand’s postulate 
implies that b n is also divisible by at least one odd prime, whenever n > 2.) 
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6.22 |z/k(k + z)| ^ 2|z|/k 2 when k > 2|z|, so the sum is well defined when 
the denominators are not zero. If z = n we have -i 0/k — V(k + n)) = 
H m — H m+n + H n , which approaches H n as m — > oo. (The quantity H z _i — y 
is often called the psi function 4>(z).) 

6.23 z/(e z + 1 ) = z/(e z - 1 ) - 2z/(e 2z - 1) = Ln^oO ~ 2 n )B n z7n!. 

6.24 When n is odd, T n (x) is a polynomial in x 2 , hence its coefficients 

are multiplied by even numbers when we form the derivative and compute 
Tn+i (x) by (6.95). (In fact we can prove more: The Bernoulli number B2 n 
always has 2 to the first power in its denominator, by exercise 54; hence 
2 2n — k \\T2 n +i 2 k \\(n+ 1 ). The odd positive integers (n+ 1 )T2 n +i /2 2n 

are called Genocchi numbers (1,1,3,17, 155, 2073, . . . ), after Genocchi [145].) 

6.25 100n-nH n < 100(n- 1) - (n - 1)H n _, <==» H n _, > 99. (The 
least such n is approximately e"~ T , while he finishes at N « e 100 ~ Y , about 
e times as long. So he is getting closer during the final 63% of his journey.) 

6.26 Let u(k) = H k _i and Av(k) = 1 /k, so that u(k) = v(k). Then we have 

Sn-H[, 2) =Lk=l H k _ 1 /k = H 2 _ 1 |^ +1 -S n =H 2 -S n . 

6.27 Observe that when m > n we have gcd(F m ,F rl ) = gcd(F m _ n , F n ) by 
(6.108). This yields a proof by induction. 

6.28 (a) Q n = oc(L n — F n )/2 + (3F n . (The solution can also be written 
Qn = aF n _! + |3F n .) (b) L n = 45- + 

6.29 When k = 0 the identity is (6.133). When k = 1 it is, essentially, 

K(xi , . . . , x n )x TTL = K(xi , . . . , x m ) K(x m , . . . ,x n ) 

— K(xi , . . . , ^m— 2 ) k (x m -f-2 > • • • 1 X n ) , 

in Morse code terms, the second product on the right subtracts out the cases 
where the first product has intersecting dashes. When k > 1, an induction 
on k suffices, using both (6.127) and (6.132). (The identity is also true when 
one or more of the subscripts on K become — 1 , if we adopt the convention that 
K 1 = 0. When multiplication is not commutative, Euler’s identity remains 
valid if we write it in the form 

km-l-n(Xl , • . . , X m _|_ rL ) KjJXm+k, . . . , XtrL+i ) 

= k m _|_ k (xi , . . . , X m -|-k) K n (^m+n> • • • > ^m+1 ) 

+ (-1) k K 

m— 1 (Xl , . . . , X m _] ) K n _ k _i (x m _|_ n , . . . , X m + k +2) • 

For example, we obtain the somewhat surprising noncommutative factoriza- 
tions 

(abc + a + c) ( 1 + ba) = (ab + 1 )(cba + a + c) 
from the case k = 2, m = 0, n = 3.) 
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6.30 The derivative of K(xi , . . . , x n ) with respect to x m is 
K(x,,..., ^m— 1 )K(x m+1 ,...,x n ), 

and the second derivative is zero; hence the answer is 

K(xi , . . . , x n ) + K(xi , . . . , x m _] ) K(x m+1 , . . . ,x n )y . 

6.31 Since x n = (x + n — 1 )— = ( k )x-(n — I)! 1 — 1 ^ we have | k | 

( k ) (n — 1 ) ;r ^— 1 -. These coefficients, incidentally, satisfy the recurrence 


n , , , , n— 1 n— 1 

k " (n_1+k) k + k -1 ’ 


integers n, k > 0. 


both of which appear in Table 265. 

6.33 If n > 0, we have [£] = l(n- 1)! (H^_, - ), by (6.71); {3} = 

l(3 n -3-2 n +3), by (6.19). 

6.34 We have = 1/(k+ 1), (^. 2 ) = H^,, and in general (£) is given 
by (6.38) for all integers n. 

6.35 Let n be the least integer > 1/e such that |_H n J > LHn-lJ- 

6.36 Now dk+i = (100+ (1 +di ) + ••• + (1 +dk))/(100 + k), and the solution 
is dk+i = Hk+100 — Hi 01 +1 for k ^ 1. This exceeds 2 when k 176. 

6.37 The sum (by parts) is H mn - (77 + 7^ H f ^7) = H mn - H n . The 

infinite sum is therefore In m. (It follows that 

V 'Vm(k) m 

k(kTTT = m^T lnm ’ 

k^l 

because v m (k) = (m — 1 ) X^>i (k mod m))/m) .) 

6.38 (“1) k (( T k 1 ) r_1 - (k-i) H k) + C - ( B y parts, using (5.16).) 

6.39 Write it as Hi<j <T1 j _1 Lj<k<n and sum first on k via (6.67), to 


(n + 1 )H 2 — (2n + 1 )H n + In . 

6.40 If 6n — 1 is prime, the numerator of 


4n -- 1 f-Mk-l 


= H4 n _i - bhn — 1 
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“Let p be any old 
prime.” 

(See [171], p. 419.) 


is divisible by 6n — 1 , because the sum is 


4n— 1 


L 

k— 2n 


1 

k 


3n — 1 

L 

k=2n 


+ 


6 n — 1 — k 


3n-l 


= L 


k=2n 


6 n — 
k(6n — 1 




Similarly if 6n + 1 is prime, the numerator of £^=-1 (— 1 ) k '/k = H4 n — H2 n 
is a multiple of 6n + 1 . For 1 987 we sum up to k = 1 324. 

6.41 S n+ i = Lk ( L(n+1 + k)/2J ) = ^k( [( \ +k |/2J), hence we have S n+1 + 
S n = Lk ( L(n+k ^ /2+1 J) = S n+2 . The answer is F n+2 . 

6.42 F n . 

6.43 Set z = jq in 2In>o^ nZn = z/0 — z — z 2 ) to get The sum is a 
repeating decimal with period length 44: 


0. 1 1 235 95505 61 797 75280 89887 64044 94382 02247 19101 1 2359 55+ . 


6.44 Replace (m, k) by (— m, — k) or (k, — m) or (— k,m), if necessary, so 
that m ^ k 0. The result is clear if m = k. If m > k, we can replace (m, k) 
by (m — k, m) and use induction. 

6.45 X n = A(n)a+B(n)(3 + C(n)y+D(n)5, where B(n) = F n , A(n) = F n _i, 
A(n) + B(n) — D(n) = 1, and B(n) — C(n) + 3D(n) = n. 

6.46 4^/2 and 4> 1 /2. Let u = cos 72° and v = cos 36°; then u = 2v 2 — 1 and 
v = 1-2 sin 2 18° = 1-2u 2 . Hence u+v = 2(u+v)(v— u), and 4v 2 — 2v— 1 = 0. 
We can pursue this investigation to find the five complex fifth roots of unity: 

4) _1 ± iV 2 + 4) -4) ±1^3 - 4> 

’ 2 2 

6.47 2 n \/5F n = (1 + v/5) n — (1 — v / 5) n , and the even powers of \/5 cancel 

out. Now let p be an odd prime. Then ( 2 j^|_i) = 0 except when k = (p — 1 )/2, 
and = 0 except when k = 0 or k = (p — 1 )/2; hence F p = 5^ p_1 ^ 2 and 

2F p+ i = 1 + 5* p_1 ^ 2 (mod p). It can be shown that 5 ,p_1 ^ 2 = 1 when p 
has the form 1 0k ± 1 , and 5* p_1 ^ 2 = —1 when p has the form 1 0k ± 3. 

6.48 Let Kt j = Kj_t + i (xt, . . . , x, ). Using (6.133) repeatedly, both sides 
expand to (Kp m _ 2 (x m _i + Xm.4-1 ) + Ki im _3)K m -|- 2)rl + Ki ilTL — 2 K m -|_3 )n . 

6.49 Set z = j in (6.146); the partial quotients are 0, 2 F °, 2 F 1 , 2 p2 , ... . 
(Knuth [206] noted that this number is transcendental.) 

6.50 (a) f(n) is even <£=+> 3\n. (b) If the binary representation of n is 
(1 Q1 0“ 2 ... 1 Qm ~ 1 0 Qm ) 2 , where m is even, we have f(n) =K(ai , a 2 , . . . , a m -i )• 
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6.51 (a) Combinatorial proof: The arrangements of {1,2,..., p} into k sub- 

sets or cycles are divided into “orbits” of 1 or p arrangements each, if we 
add 1 to each element modulo p. For example, 

{1,2,4}U{3,5} -4 {2, 3, 5} U {4,1} -4 {3,4, 1}U{5,2} 

-4 {4, 5, 2} U {1,3} -4 {5, 1 , 3} U{2,4} -4 {1 ,2,4} U{3,5} . 

We get an orbit of size 1 only when this transformation takes an arrangement 
into itself; but then k = 1 or k = p. Alternatively, there’s an algebraic proof: 
We have x p = xk + xl and xh = x p — x (mod p), since Fermat’s theorem tells 
us that x p — x is divisible by (x — 0 ) (x — 1 ) . . . (x — (p — 1 )) . 

(b) This result follows from (a) and Wilson’s theorem; or we can use 
x^T = x p /(x - 1 ) = (x p - x)/(x - 1 ) = xP - 1 + x p - 2 + ■ ■ ■ + x. 

(c) We have {P+ 1 } = [P+ 1 ] = 0 for 3 5} k ^ p, then { p + 2 } = [ p + 2 ] = 0 
for 4 ^ k ^ p, etc. (Similarly, we have [ 2p p '] = — {"^p^ 1 } = 1.) 

(d ) pi = P p = L k (-i )p- k p k g] = pp [>] - pp - 1 [/,] + ■ • • + p 3 [§] - 
p 2 [ 2 ] + p [ 1 ] • But p [ 1 ] = p ; . s ° 



is a multiple of p 2 . (This is called Wolstenholme’s theorem.) 

6.52 (a) Observe that H n = H)} + H^ n/p j/p, where H* = [k_Lp]/k. 

(b) Working mod 5 we have H r = (0,1,4, 1,0) for 0 ^ r ^ 4. Thus the 
first solution is n = 4. By part (a) we know that 5\a n =4- 5 \ai n / 5 j; so 
the next possible range is n = 20 + r, 0 SC r ^ 4, when we have H n = 
+ ^H 4 = + 5 H 4 + H r + X!k=i 20/k(20 + k). The numerator of 

H^o, like the numerator of H 4 , is divisible by 25. Hence the only solutions 
in this range are n = 20 and n = 24. The next possible range is n. = 
100 + r; now H n = H}} + 5 H 20 , which is 5 H 20 + H r plus a fraction whose 
numerator is a multiple of 5. If 5 H 20 = tn (mod 5), where m is an integer, 
the harmonic number Hioo+r will have a numerator divisible by 5 if and only 
if m + H t = 0 (mod 5); hence m must be = 0, 1 , or 4. Working modulo 5 we 
find 5 H 20 = 5 H 2 Q + J 5 H 4 = 2 ^H 4 = Y 2 = 3; hence there are no solutions for 
100 ^ n 104. Similarly there are none for 120 ^ n 124; we have found 
all three solutions. 

(By exercise 6.51(d), we always have p 2 \a p _i , p\a p 2 _ p , and p\a p 2 _ 1 , 
if p is any prime 5. The argument just given shows that these are the only 
solutions to p\a n if and only if there are no solutions to p~ 2 H p _i + H r = 0 
(mod p) for 0 ^ r < p. The latter condition holds not only for p = 5 but 
also for p = 13, 17, 23, 41, and 67 — perhaps for infinitely many primes. The 
numerator of H n is divisible by 3 only when n = 2, 7, and 22; it is divisible 


(Attention, com- 
puter programmers: 
Here’s an interest- 
ing condition to 
test, for as many 
primes as you can.) 
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by 7 only when n = 6, 42, 48, 295, 299, 337, 341, 2096, 2390, 14675, 16731, 
16735, and 102728.) 

6.53 Summation by parts yields 

^w(y) ((n+2)Hm+i ' i)_i ) ’ 

6.54 (a) If m ^ p we have S m (p) = S m _( p _-|)(p) (mod p), since k p_1 = 1 
when 1 ^ k < p. Also S p _i (p) = p — 1 = — 1. If 0 < m < p — 1, we can write 


S m (p) = Y 

5=0 


p — 1 m 

i) m Y k ’ = Y 

k=0 j=0 


(-11 


. pi±i 

= o. 

i + i 


(The numerators of 
Bernoulli numbers 
played an impor- 
tant role in early 
studies of Fermat’s 
Last Theorem; see 
Ribenboim [308].) 


(b) The condition in the hint implies that the denominator of Iz-n is not 
divisible by any prime p; hence l2 n must be an integer. To prove the hint, 
we may assume that n > 1 . Then 


2n-2 


[(p-’Mln)] 

n 2 — 


k=0 


2 n- 


1\„ P 2n ~ k 
k 2n+l 


is an integer, by (6.78), (6.84), and part (a). So we want to verify that none 
of the fractions ( 2n k h1 )B] < p 2n_l< /(2rL + 1) = ( 2 | ) l )B| C p 2n_k /(2ri — k + 1) has a 
denominator divisible by p. The denominator of ( 2 | ( l )Bi c p isn’t divisible by p, 
since B^ has no p 2 in its denominator (by induction); and the denominator 
of p 2n ~ k ~V(2n — k + 1 ) isn’t divisible by p, since 2n — k + 1 < p 2n ~ k when 
k 2n— 2; QED. (The numbers l2 n are tabulated in [224]. Hermite calculated 
them through Its in 1875 [184], It turns out that I2 = I4 = 16 = Is = 
I10 = Ii 2 = 1; hence there is actually a “simple” pattern to the Bernoulli 
numbers displayed in the text, including 2730 (0- But the numbers l2 n don’t 
seem to have any memorable features when 2n > 12. For example, B24 = 
— 86579 — j — j — 5 — 7 — P31 anc i 86579 is prime.) 

(c) The numbers 2—1 and 3—1 always divide 2n. If n is prime, the only 
divisors of In are 1 , 2, n, and 2n, so the denominator of B2 n for prime n > 2 
will be 6 unless 2n+l is also prime. In the latter case we can try 4n+3, 8n+7, 
. . . , until we eventually hit a nonprime (since n divides 2 n_1 n + 2 n ~' — 1). 
(This proof does not need the more difficult, but true, theorem that there are 
infinitely many primes of the form 6k + 1.) The denominator of B2n can be 6 
also when u. has nonprime values, such as 49. 

6.55 The stated sum is ( x ] f t n ) ( m +i)> by Vandermonde’s convolution. 

To get (6.70), differentiate and set x = 0. 
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6.56 First replace k n+1 by ((k — m) + m ) n+1 and expand in powers of 
k — m; simplifications occur as in the derivation of ( 6 . 72 ). If m > n or 
m < 0, the answer is (— l) n n! — m n /( n i n m )- Otherwise we need to take the 
limit of ( 5 . 41 ) minus the term for k = m, as x -) — m; the answer comes to 
(-1 ) n u! + (-1 r +1 (> E in + 1 + mH n _ m - mH m ). 

6.57 First prove by induction that the nth row contains at most three 
distinct values A n Bn C n ; if n is even they occur in the cyclic or- 
der [C n , B n , A n , B n , C n ], while if n is odd they occur in the cyclic order 
[C n , B n , A n , A n , B n ] . Also 

A2n+1 = A2n “F B2n j A2n = 2A2n — 1 ] 

B2n+1 = B2n + C2n! B2 n = A2 n _i + B2n-1 ! 

f-2n+l — ^C2n j 02n = B2n — 1 T C2n— 1 • 


It follows that Q n = A n — Cn = F n+ i. (See exercise 5.75 for wraparound 
binomial coefficients of order 3.) 

6.58 (a) Ln>o F n zn = z(1-z)/(1+z)(1-3z+z 2 ) = f ((2-3z)/(1-3z+z 2 )- 

2/(1 — z) ) . (Square Binet’s formula ( 6 . 123 ) and sum on n, then combine terms 
so that (|) and $ disappear.) (b) Similarly, 


n^O 


z(l — 2 z— z 2 ) 

(1 — 4z— z 2 )(l +z— z 2 ) 


1 / 2z 3z \ 

5 \ 1 — 4z— z 2 1 +z— z 2 ) 


It follows that F 2 + 1 —4F 2 —F 2 _ 1 =3(— l) n F n . (The corresponding recurrence 
for mth powers involves the Fibonomial coefficients of exercise 86 ; it was 
discovered by Jarden and Motzkin [194].) 

6.59 Let m be fixed. We can prove by induction on n that it is, in fact, 
possible to find such an x with the additional condition x ^ 2 (mod 4). If x 
is such a solution, we can move up to a solution modulo 3 n+1 because 


F 8 . 3 n-. = 3 n , Fg.an-,-! = 3 n + l (mod 3 n+1 ) ; 


either x or x + 8 • 3 n 1 or x + 1 6 • 3 n 1 will do the job. 

6.60 Fi + 1 , F 2 + 1 , F 3 + 1 , F 4 — 1 , and Fg — 1 are the only cases. Otherwise 
the Lucas numbers of exercise 28 arise in the factorizations 


F 2m + ( — l) m — B m+ i F m _i ; F 2 m +i + (— 1) m — L m F m+ i ; 
F2m — (— 1) m = L m _i F m+1 ; F2m+i — (— 1) m = F m+ iF m . 

(We have F m+n - (-l) n F m _ n = L m F n in general.) 
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6.61 1 /^im = F m _i/F m ~~ ^2m-i /^ 2 m when m is even and positive. The 
second sum is 5/4 — ^ 3 ■ 2 n -^ /^ 3 - 2 n , for n. ]> 1 . 

6.62 (a) A n = V5A n _! — A n _2 and B tl = \/5B n _i — B n _2- Incidentally, 
we also have \/5 A n + B n = 2A n+ i and \/5 B n — A n = 2B n _i . (b) A table of 
small values reveals that 

V5F n , n even; 

L n , n odd. 

(c) B n / A n+ i B n _i / A n = 1 /(F 2n+ 1 H - 1 ) because B n A n B n _i A n -^i = y/5 
and A n A n+ i = V5 (F 2n +i +1 )• Notice that B n /A n+1 = (F n /F n+ i )[n even] + 
(L n /L n +i )[n odd], (d) Similarly, ^£ =1 l/(F 2 k+i -1) = (A 0 /Bi -Ai/B 2 ) + 
■ ■ ■ + (A n _i /B n — A n /B n+ i ) = 2 — An/B n+ i . This quantity can also be 
expressed as (5F n /L n+ i )[n even] + (L n /F n +i )[n odd]. 

6.63 (a) [£] . There are [^Z]] with 7t n = n and (n — 1 ) [ n 1 / 1 ] with 7t n < n. 
(b) (k)- Each permutation pi . . . p n _i of {1 ,..., n— 1} leads to n permutations 
7ti 7t2 . . . 7t n = pi . . . Pj_i n Pj + i . . . p n _i Pj. If pi . . . p n -i has k excedances, 
there are k+ 1 values of j that yield k excedances in 7ti 7T2 . . . 7t n ; the remaining 
n— 1 — k values yield k+ 1 . Hence the total number of ways to get k excedances 
in 7Ti7t 2 . . .7t n is (k+ l)( n ^ 1 ) + ((n- 1) - (k- l])^!, 1 ) = (£)■ 

6.64 The denominator of (Yn) 2 4n ~~' , A n *, by the proof in exercise 5.72. 
The denominator of [ 1 1 , / Z n ] is the same, by (6.44), because ((q)) = I and 
((£)) is even for k > 0. 

6.65 This is equivalent to saying that (£)/n! is the probability that we 
have |_Xi + ■ ■ ■ + x n J = k, when xi , . . . , x n are independent random numbers 
uniformly distributed between 0 and 1. Let pj = (xi + ■ ■ ■ +Xj) mod 1. Then 
y 1 , . . . , y n are independently and uniformly distributed, and |_xi + • • ■ + x n J 
is the number of descents in the p’s. The permutation of the p’s is random, 
and the probability of k descents is the same as the probability of k ascents. 

6.66 2 n+1 (2 n+1 — 1)B n+ i/(n + 1), if n > 0. (See (7.56) and (6.92); the 
desired numbers are essentially the coefficients of 1 — tanhz.) 

6.67 ItisL t ({i;,)(k+l)! + K}ld)Cr‘)(-l)"'- l = j; t {aid(-l)">- | ‘x 

(CA)-( n „7™ k )) =E43kH-iP +, - k („2A,) by (6.3) and 

(6.40). Now use (6.34). (This identity has a combinatorial interpretation [59].) 

6.68 We have the general formula 


f L n , n even; 
\V5F n , n odd; 



n + m + 1 — kl 
m + 1 — k J 


(- 1 )” 


for n > m 5; 0, 
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analogous to ( 6 . 38 ). When m = 2 this equals 



{TH^ + "{ n 2 2 M 2n 2 +, ){ n l' 

j3 n+2 - (2n + 3)2 n+1 + \ (4 n 2 + 6n + 3) . 


6.69 jn(n+ 2 )(n+ 1 )(2H2n ^H n ) — jgn(10n 2 + 9n— 1 ). (It would be nice 
to automate the derivation of formulas such as this.) 

6.70 1 /k — 1 /(k + z) = z/k 2 — z 2 /k 3 + • • • , which converges when |z| < 1 . 

6.71 Note that Ov-i (1 + z/k)e~ z/k = ( n+z )ri.- z e (lnn - H - )z . If f(z) = 
£(z!) we find f(z)/z! +y = H z . 

6.72 For tan z, we can use tan z = cot z — 2 cot 2z (which is equivalent to the 
identity of exercise 23). Also z/sinz = zcotz + ztan has the power series 

Ln Ss o(- 1 ) n_1 (4 n -2)B 2n z 2 V(2n)!; and 


In 


tanz 


smz 

In In cos z 


= Ih: 

nz> 1 

- Ih: 


L 4-B 2 n z 2 - 

(2n)(2n)! 


- Ih: 

n^l 

1 4 n ( 4 n — 2)B 2 n z 2n 


L 4 n (4 n -l)B 2 n z 2n 

(2n)(2n)! 


n> 1 


(2n)(2n)! 


because In sin z = cot z and In cos z = — tan z. 

dz dz 

6.73 cot(z + 7t) = cotz and cot(z + jTt) = —tanz; hence the identity is 
equivalent to 


cot z 


^ 2 n -l 

— — y cot 

in Z 


Z + k7t 


k=0 


which follows by induction from the case n = 1 . The stated limit follows since 
zcot z — > 1 as z — > 0. It can be shown that term-by-term passage to the limit 
is justified, hence ( 6 . 88 ) is valid. (Incidentally, the general formula 


cot z 


1 

n 


n— 1 

y~ cot 

k=0 


Z + k7t 

n 


is also true. It can be proved from ( 6 . 88 ), or from 


1 


e nz _ -| 


1 

n 


n— 1 

L 


k=0 


1 

gz+2k.7ri/n "j ’ 


which is equivalent to the partial fraction expansion of 1 /(z n — 1 ).) 
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6.74 Since tan2z + sec2z = (sinz + cos z)/(cosz — sinz), setting x = 1 
in ( 6 . 94 ) gives T n (1) = 2 n T n when n is odd, T n (1) = 2 n |E n | when n is 
even, where 1 /cosz = )T n>0 l^ 2 n |z 2 n /( 2 n)!. (The coefficients E n are called 
secant numbers-, with alternating signs they are called Euler numbers, not 
to be confused with the Eulerian numbers (£). We have (Eo, E2, E4, . . . ) = 
(1,-1, 5, -61 , 1385, -50521 , 2702765, ...).) 

6.75 Let G(w,z) = sinz/ cos (w + z) and H(w,z) = cosz/ cos(w + z), and let 
G(w, z) + H(w, z) = X!m n A m>n w m z n /rrU n!. Then the equations G(w,0) = 

0 and (jp — gp)G(w,z) = H(w, z) imply that A m o = 0 when m is odd, 
A m ,n+i = A m+ i n + A miTl when m + n is even; the equations H(0,z) = 1 
and (-gp — -jp)H(w,z) = G(w,z) imply that Ao, n = [tl = 0] when n is even, 
A m +i,n = A m n+ i + A m n when m + n is odd. Consequently the nth row 
below the apex of the triangle contains the numbers A n o, A n _i j , . . . , Ao, n . 

At the left, A n o is the secant number |E n |; at the right, Ao, n = T n + [n = 0]. 

6.76 Let A n denote the sum. Looking ahead to equation ( 7 . 49 ), we see 
that LnAnzVu! = (T nk (— 1 ) k {^} 2 n_k k! z n /n! = £ k (-l ) k 2 - k (e 2z -l ) k = 

2 /(e 2z + 1 ) = 1 — tanhz. Therefore, by exercise 23 or 72, 

A n = ( 2 n+1 — 4 rL+1 )B n+ i /(n + 1 ) = (— 1 ) (n+ 1 ^ / 2 T n + [n = 0] . 


6.77 This follows by induction on m, using the recurrence in exercise 18. It 
can also be proved from ( 6 . 50 ), using the fact that 


(— l) m - 1 (m — 1)! 
(e z - l) m 


(D + 1 ) m_1 — — ! — - 

e z — 1 


m— 1 


L 


■ m -I d m-k-l 1 

m — k dz m_k_1 e z — 1 


integer m > 0 . 


The latter equation, incidentally, is equivalent to 


d m 1 
dz m e z — 1 


M) m X; 

k 


f m + 1 

l K 


(k- 1)! 

(e z — 1 ) k ’ 


integer m/ 0 . 


6.78 If p(x) is any polynomial of degree ^ n, we have 

Pb0 = Ip(-kl(/)(//), 


because this equation holds for x = 0, — 1, . . . , — n. The stated identity is 
the special case where p(x) = xcr n (x) and x = 1. Incidentally, we obtain 
a simpler expression for Bernoulli numbers in terms of Stirling numbers by 



560 ANSWERS TO EXERCISES 


setting k = 1 in ( 6 . 99 ): 



6.79 Sam Loyd [256, pages 288 and 378] gave the construction 



and claimed to have invented (but not published) the 64 = 65 arrangement 
in 1858. (Similar paradoxes go back at least to the eighteenth century, but 
Loyd found better ways to present them.) 

6.80 We expect A m /A m _i « cf>, so we try A m _i = 61 8034 + r and A m _2 = 
381966— r. Then A m _3 = 236068+2r, etc., and we find A m _is = 144— 2584r, 
A m _i 9 = 154 + 41 81 r. Hence r = 0, x = 154, y = 144, m = 20. 

6.81 If P(F n+ i , F n ) = 0 for infinitely many even values of n, then P(x,y) is 
divisible by U(x,y) — 1, where U(x,y) = x 2 — xy — y 2 . For if t is the total 
degree of P, we can write 

t 

F’(x,y) = ^q k xV~ k + Y_ r i,kx’y k = Q(x,y) + R(x,y) . 

k=0 j+k<t 


Then 


P(Fn+1»F n ) 


t 


Y_ 4k 



+ 0(1/F n ) 


and we have ]T |) =0 4k4 )k = 0 by taking the limit as n — » 00 . Hence Q(x,y) 
is a multiple of U(x,y), say A(x,y)U(x,y). But U(F n+ i,F n ) = (— l) n and 
n is even, so Po(x,y) = P(x,y) — (U(x,y) — l)A(x,y) is another polynomial 
such that Po(F n+ i , F n ) = 0. The total degree of Po is less than t, so Po is a 
multiple of U — 1 by induction on t. 

Similarly, P(x,y) is divisible by U(x,y) + 1 if P(F n+ i,F n ) = 0 for 
infinitely many odd values of n. A combination of these two facts gives the 
desired necessary and sufficient condition: P(x,y) is divisible by U(x,y ) 2 — 1. 
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Exercise: m o n = 
ran + 

L(m+1 )/4 >Jti+ 
TTL L(n.+ 1 )/ct>J . 


6.82 First add the digits without carrying, getting digits 0, 1, and 2. Then 
use the two carry rules 

0 (d+1 ) (e-Fl ) — > 1 d c , 

0 (d+2) 0 e -> 1 d0(e + 1) , 


always applying the leftmost applicable carry. This process terminates be- 
cause the binary value obtained by reading (b m . . . b 2 )p as (b m . . . ^ 2)2 in- 
creases whenever a carry is performed. But a carry might propagate to the 
right of the “Fibonacci point”; for example, (1 )? + (l )f becomes (10.01 )p. Such 
rightward propagation extends at most two positions; and those two digit po- 
sitions can be zeroed again by using the text’s “add 1” algorithm if necessary. 

Incidentally, there’s a corresponding “multiplication” operation on 
nonnegative integers: If m = Fj, +■ ■ -+Fj and n = Fk, +■ ■ - + Fk r in the Fibo- 
nacci number system, let mo u = )r£ =1 Y. ' c - 1 Fj b+ k c , by analogy with mul- 
tiplication of binary numbers. (This definition implies that m o n ss \/5 mu. 
when m and n are large, although 1 on« (J) 2 u.) Fibonacci addition leads to 
a proof of the associative law lo(mon) = (lom)on. 

6.83 Yes; for example, we can take 

A 0 = 331635635998274737472200656430763; 

At = 1510028911088401971189590305498785. 


The resulting sequence has the property that A n is divisible by (but un- 
equal to) pic when n mod rrik = W, where the numbers (pk, m.k,Tk) have the 
following 18 respective values: 


(3,4,1) 

(7,8,3) 

(47,16,7) 

(2207,32,15) 

(1087,64,31) 

(4481,64,63) 


(2,3,2) 

(17,9,4) 

(19,18,10) 

(53,27,16) 

(109,27,7) 

(5779,54,52) 


(5,5,1) 

( 11 , 10 , 2 ) 

(61,15,3) 

(31,30,24) 

(41,20,10) 

(2521 , 60, 60) 


One of these triples applies to every integer u; for example, the six triples in 
the first column cover every odd value of n, and the middle column covers all 
even n that are not divisible by 6. The remainder of the proof is based on 
the fact that A m+n = A m F n _i + A m+ iF n , together with the congruences 

A 0 = F mk -r k mod pk , 

Ai = F mk _ Tk+1 mod p k , 
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for each of the triples (Pk, m-k, r k)- (An improved solution, in which Ao 
and A] are numbers of “only” 17 digits each, is also possible [218].) 

6.84 The sequences of exercise 62 satisfy A_ m = A m , B_ m = — B m , and 

AmAn = A m+n T A m _ n , 

AniBix = B m+n B m _n j 
BmBn = A m+n A m _ n . 

Let f k = B mk /A mk +1 and g k = A mk /B m k+i, where l = \{n - m). Then 
fk+1 fk — AiB m / f A2nik+n T A m ) and gk gk+1 — AiB m / ( A2m.k+n A m ); 
hence we have 

S+ 

*^m ,n 

s _ 

,n 


6.85 The property holds if and only if N has one of the seven forms 5 k , 
2-5 k , 4-5 k , 3E5 k , 6-5 k , 7-5 k , 14-5 k . 

6.86 For any positive integer m, let r(m) be the smallest index ) such that 
Cj is divisible by m; if no such ) exists, let r(m) = oo. Then C n is divisible 
by m if and only if gcd(C n , C r ( m )) is divisible by m if and only if C gc d(n,r(m)} 
is divisible by m if and only if gcd(u, r(m)) = r(m) if and only if n is divisible 
by r(m). 

(Conversely, the gcd condition is easily seen to be implied by the con- 
dition that the sequence Ci, C 2 , ■ ■ • has a function r(m), possibly infinite, 
such that C n is divisible by m if and only if n is divisible by r(m).) 

Now let TT(n) = Ci C 2 . . . C n , so that 

/m + riA TT(m + n) 

V rn ) e TT(m)TT(n) ' 

If p is prime, the number of times p divides F[(n) is f p (n) = ]T k>1 L n / r (P k )J i 
since Ln./p k J is the number of elements {Ci , . . . , C n } that are divisible by p k . 
Therefore f p (m + n) f p (m) + f p (n) for all p, and ( m n ( n ) e is an integer. 

6.87 The matrix product is 

{ ^n-2(^2) - • • An.-1 ) lin-l(^2r")^n-l4n] 

\ K n -1 (*1 > *2 > • • • > *n — 1 ) Kn (^1 > ^2 > • • • > *n — 1 > *n) 



V5 


lim (f k -f 0 ) = 


AiBm 

^ lim (g 0 - g k ) 


A[B m k— »oo 




4> l AiL m ’ 

y/5 ( 2_ 

Bi 


AiL, 

2 


FikL m 


-S 


1 

¥ 

m,n • 
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This relates to products of L and R as in (6.137), because we have 


0 1 

1 0 


0 1 

1 a 


0 1 
1 0 


The determinant is K n (xi , . . . , x n ); the more general tridiagonal determinant 

1 0 ... 0 \ 

x 2 1 0 

Vi x 3 1 : 

1 

0 ... x n / 

satisfies the recurrence D n = x n D n _i — PnDn- 2 - 

6.88 Let 0 C 1 = Qo + 1 / (ai + 1 /(a 2 + •••)) be the continued fraction rep- 
resentation of a . Then we have 


/ X1 

V2 

0 

V 0 


oo + 


z 


A 0 (z) 


1 


1 


Ai (z) -| — 

A 2 (z) H — — 


1 — z 


z 


Y_ z [naJ , 

n ^1 


where 


2 Qm+l 2 , ~ 1 

A m (z) = - - , c| m = K m (ai , . . . , a m ) . 

A proof analogous to the text’s proof of ( 6 . 146 ) uses a generalization of Zeck- 
endorf’s theorem (Fraenkel [129, §4]). If z = 1 /b, where b is an integer 5; 2, 
this gives the continued fraction representation of the transcendental number 
(b — 1 ) 2In>i b _ L na J ) as in exercise 49. 

6.89 Let p = K(0, Qi , a 2 , . . . , a m ), so that p/n is the mth convergent to the 
continued fraction. Then a = p/n + (— 1 ) m /nq, where q = K(ai , . . . , a m , (3) 
and (3 > 1 . The points {koc} for 0 ^ k < n can therefore be written 

0 1 (-1) m 7t! n — 1 (- 1 TX _1 

n ’ n nq ’ ' ' ’ ’ n nq 

where 7 ti . . . 7 t n _i is a permutation of (1 , . . . , n — 1}. Let f (v) be the number 
of such points < v; then f (v) and vn both increase by 1 when v increases from 
k/n to (k + 1 )/n, except when k = 0 ork = n— l,so they never differ by 2 


or more. 
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6.90 By (6.139) and (6.136), we want to maximize K(a-| , . . . , a m ) over all 
sequences of positive integers whose sum is ^ n + 1 . The maximum occurs 
when all the a’s are 1 , for if ) ^ 1 and 1 we have 

Kj+k+i (1 , . . . , 1 , a + 1 , bi , . . . , b k ) 

= K j+k+1 ( 1 , . . . , 1 , a, bi , . . . , b k ) + Kj ( 1 , . . . , 1 ) K k (bi , . . . , b k ) 

5$ Kj +k+ i (1 , ■ • • , 1 , a, b! , . . . , b k ) + Kj +k (l , . . . , 1 , a, b, , . . . , b k ) 
= K j+k+2 (1 ,..., 1 ,a,b 1 ,...,b k ). 

(Motzkin and Straus [278] show how to solve more general maximization 
problems on continuants.) 

6.91 A candidate for the case n mod 1 = j appears in [213, §6], although 
it may be best to multiply the integers discussed there by some constant 
involving ^/ir. Alternatively, Renzo Sprugnoli observes that we can define 

(^)k n (— 1 )^m — k)/m! for integer m ^ 0 and arbitrary n ^ 0; 
then (6.3) holds for all n ^ 1 . 

6.92 (a) If there are only finitely many solutions, it is natural to conjec- 
ture that the same holds for all primes, (b) The behavior of b n is quite 
strange: We have b n = lcm( 1 , . . . , n) for 968 ^ n ^ 1 066; on the other hand, 
b600 = lcm (1 , . . . , 600)/(3 3 -5 2 -43). Andrew Odlyzko observes that p divides 
lcm( 1 , . . . , n) /b n if and only if kp m ^ n < (k + 1 )p m for some m j> 1 and 
some k < p such that p divides the numerator of H k . Therefore infinitely 
many such n exist if it can be shown, for example, that almost all primes 
have only one such value of k (namely k = p — 1 ) . 

6.93 (Brent [38] found the surprisingly large partial quotient 1568705 in e Y , 
but this seems to be just a coincidence. For example, Gosper has found even 
larger partial quotients in 7t: The 453,294th is 12996958 and the 11,504,931st 
is 878783625.) 

6.94 Consider the generating function ]JT m n>o | m r ^ n '|w m z n , w -hi c h has the 
form (wF(cl, b, c) + zF(a', b', c')) n , where F(a, b, c) is the differential op- 
erator a + bd w + c9 z . 

6.95 Complete success might be difficult or impossible, because Stirling 
numbers are not “holonomic” in the sense of [382] . 

7.1 Substitute z 4 for □ and z for □ in the generating function, getting 
1/(1 — z 4 — z 2 ) . This is like the generating function for T, but with z replaced 
by z 2 . Therefore the answer is zero if m is odd, otherwise F m / 2+1 • 

7.2 G(z) = 1/(1 -2z) + 1/(1 — 3z); G(z) = e 2z + e 3z . 

7.3 Set z = 1/10 in the generating function, getting ^ In 


Another reason to 
remember 1066? 
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7.4 Divide P(z) by Q(z), getting a quotient T(z) and a remainder P 0 (z) 
whose degree is less than the degree of Q. The coefficients of T(z) must be 
added to the coefficients [z n ] Pq(z)/Q(z) for small n. (This is the polynomial 
T(z) in ( 7 . 28 ).) 

7.5 This is the convolution of (1 + z 2 ) r with ( 1 + z) r , so 
S(z) = (1 + z + z 2 + z 3 ) r . 


I bet that the con- 
troversial “fan of 
order zero” does 
have one spanning 
tree. 


Incidentally, no simple form is known for the coefficients of this generating 
function; hence the stated sum probably has no simple closed form. (We can 
use generating functions to obtain negative results as well as positive ones.) 

7.6 Let the solution to go = a, gi = (3, g^ = g n -i + 2g n _2 + (—1 1'y be 
g n = A(n)cx + B(n)|3 + C(n)y. The function 2 n works when a = 1, (3=2, 
y = 0; the function (— 1) n works when a = 1, |3 = — 1, y = 0; the function 
(— l) n n works when a = 0, [3 = — 1, y = 3. Hence A(n) + 2B(n) = 2 n , 
A (n) - B(n) = (-1 ) n , and -B(n) + 3C(n) = (-1 ) n n. 

7.7 G(z) = (z/(l — z) 2 )G(z) + 1, hence 


G(z) 


1 — 2 z + z 2 
1 — 3z + z 2 


z 

1 — 3z + z 2 ’ 


we have g n = F 2 n + [n = 0 ]. 

7.8 Differentiate (1 — z ) _x_1 twice with respect to x, obtaining 



Now set x = m. 

7.9 (n+ 1)(H 2 — Ha 1 ) — 2n(H n — 1). 

7.10 The identity H k _i /2 — H_i /2 = jFry + • * • + f = 2 H 2 k — H k implies 

that ( 2 k k )( 2 rk k )(2H2k-H k ) =4-H n . 

7.11 (a) C(z) = A(z)B(z 2 )/(1 — z). (b) zB'(z) = A(2z)e z , hence A(z) = 
fe- z/ 2 B'(|). (c) A(z) = B (z)/ ( 1 -z) r+1 , hence B(z) = (1 -z) r+ 1 A(z) and 
we have f k (r) = ( T + 1 ) (-1 ) k . 

7.12 C n . The numbers in the upper row correspond to the positions of +1 ’s 
in a sequence of +l’s and — l’s that defines a “mountain range”; the numbers 
in the lower row correspond to the positions of — 1’s. For example, the given 
array corresponds to 
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7.13 Extend the sequence periodically (let x m+ k = Xk) and define s n = 
xi + • • • + x n . We have s m = 1, S 2 m = 21, etc. There must be a largest index 
kj such that Sk, = j, Skj+m = l + j, etc. These indices ki , . . . , k^ (modulo m) 
specify the cyclic shifts in question. 

For example, in the sequence (—2, 1 , — 1 , 0, 1 , 1 , — 1 , 1 , 1 , 1 ) with m = 1 0 
and l = 2 we have ki = 1 7, k 2 = 24. 

7.14 G(z) = — 2zG(z) + G(z) 2 + z (be careful about the final term!) leads 
via the quadratic formula to 

a, 1 +2 z-Vl + 4z 2 
G(z) = . 

Hence g 2 n+i = 0 and g 2 n = (—1 ) n (2n)! C n _i , for all n > 0. 

7.15 There are (£)u) n _k partitions with k other objects in the subset con- 

taining n+1. Hence P'(z) = e z P(z). The solution to this differential equation 
is P(z) = e e +c , and c = — 1 since P (0) = 1. (We can also get this result by 
summing (7.49) on m, since u) n = {•£}•) 

7.16 One way is to take the logarithm of 


B(z) = 1/((1 -z) Q ’(1 -z 2 ) az (l -z 3 ) a 3 (l -z 4 ) a4 ...), 

then use the formula for In and interchange the order of summation. 

7.17 This follows since t n e _t dt = n!. There’s also a formula that goes 
in the other direction: 


Gfz) = 


1 

271 


G(ze' 




d 0 . 


7.18 (a) C(z — |); (b) — C'(z); (c) C(z)/£(2z). Every positive integer is 
uniquely representable as m 2 q, where q is squarefree. 

7.19 If n > 0, the coefficient [z n ] exp(xlnF(z)) is a polynomial of degree n 
in x that’s a multiple of x. The first convolution formula comes from equating 
coefficients of z n in F(z) x F(z) v = F(z) x+V . The second comes from equating 
coefficients of z n_1 in F'(z)F(z ) x_1 F(z) y = F'(z)F(z) x+v_1 , because we have 


F'(z)F(z) x ^ = x- 1 ^(F(z) x ) = x - 1 ^nf n (x) z ^ 1 

n^O 


(Further convolutions follow by taking 0/0x, as in ( 7 . 43 ).) 
Still more is true, as shown in [221]: We have 


L 


xf k (x + tk) yf n _k(D + t(n — k)) 


k=0 


(x + xj)f n (x + ij + tn) 


x + tk 


y + t(n — k) 


x + y + tn 
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for arbitrary x, y, and t. In fact, xf n (x + tn)/(x + tn) is the sequence of 
polynomials for the coefficients of fF t (z) x , where 

T t (z) = F(zT t (z) t ). 

(We saw special cases in ( 5 . 59 ) and ( 6 . 52 ).) 

7.20 Let G(z) = X!n>o 9nA n . Then 

z l G (k) (z) = X.n^g n z- k+l = ^(n + k-l)^g n +k-iz n 

n^O n^O 

for all k, l 5; 0, if we regard g n = 0 for n < 0. Hence if Pq(z), . . . , P m (z) are 
polynomials, not all zero, having maximum degree d, then there are polyno- 
mials Po(n), . . . , p m +d( n ) such that 

m+d 

P 0 (z)G(z) + - • + P m (z)G ,m, (z) = Y_ Y Pj( n )gn+j-dZ n ■ 

n^O j=0 


Therefore a different iably finite G(z) implies that 


m+d 

Y_ Pi (n + d) g n +j = 0 , for all n ^ 0 . 

j=o 


The converse is similar. (One consequence is that G(z) is differentiably finite 
if and only if the corresponding egf, G(z), is differentiably finite.) 


This slow method of 
finding the answer 
is just the cashier’s 
way of stalling until 
the police come. 


The USA has 
two-cent pieces, but 
they haven ’t been 
minted since 1873. 


7.21 This is the problem of giving change with denominations 10 and 20, so 
G(z) = 1/(1 — z 10 )(l — z 20 ) = G(z 10 ), where G(z) = 1/(1 — z)(1 — z 2 ). (a) The 
partial fraction decomposition of G(z) is j (1 — z )~ 2 + ^(1 — Z + 1 + \ (1 + Z + 1 , 
so [z 11 ] G(z) = l(2n + 3 + (— l) n ). Setting n = 50 yields 26 ways to make 
the payment, (b) G(z) = (1 + z)/(l — z 2 ) 2 = (1 + z) ( 1 + 2z 2 + 3z 4 + •••), 
so [z n ] G(z) = |n/2J + 1. (Compare this with the value N n = L n /^J + 1 i n 
the text’s coin-changing problem. The bank robber’s problem is equivalent 
to the problem of making change with pennies and tuppences.) 

7.22 Each polygon has a “base” (the line segment at the bottom). If A 
and B are triangulated polygons, let AAB be the result of pasting the base 
of A to the upper left diagonal of A, and pasting the base of B to the upper 
right diagonal. Thus, for example, 


0 A 



(The polygons might need to be warped a bit and/or banged into shape.) 
Every triangulation arises in this way, because the base line is part of a unique 
triangle and there are triangulated polygons A and B at its left and right. 
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Replacing each triangle by z gives a power series in which the coefficient 
of z n is the number of triangulations with n triangles, namely the number 
of ways to decompose an (n + 2)-gon into triangles. Since P = 1 + zP 2 , this 
is the generating function for Catalan numbers Co + Ciz + C2Z 2 + • • • ; the 
number of ways to triangulate an n-gon is C n _2 = ( 2 ( v 2 4 )/(n — 1 ). 

7.23 Let a n be the stated number, and b n the number of ways with a 2 x 1 x 1 
notch missing at the top. By considering the possible patterns visible on the 
top surface, we have 


a n — 2a n _! + 4b n _! + a n _2 + [n — 0] ; 
b n = a n -i + b n _i . 


Hence the generating functions satisfy A = 2zA + 4zB + z 2 A+ 1, B = zA + zB, 
and we have 


A(z) 


1 — z 

(1 + z)(1 — 4z + z 2 ) ' 


This formula relates to the problem of 3 x n. domino tilings; we have a n = 
^(U 2 n + V2n+i+(-1) n ) = l(2 + V3r +1 +l(2-V3r+ 1 +l(-1) n ,which 
is (2 + v^3 ) n+1 /6 rounded to the nearest integer. 

7.24 n k) +... +km=T1 hi • . . . • k m /m = F 2n +i + F 2n -i ~ 2 - (Consider the 
coefficient [z n_1 ] ln(l /(I — G(z))), where G(z) = z/(l — z) 2 .) 

7.25 The generating function is P(z)/(1 — z m ), where P(z) — z + 2z 2 + 
• • • + (m — 1 )z m_1 = ((m — 1 )z m+1 — mz m + z)/(l — z) 2 . The denominator 
is Q(z) = 1 - z m = (1 - co°z) ( 1 - w'z ) . . . (1 - eu^z). By the rational 
expansion theorem for distinct roots, we obtain 


m 1 v — 

n mod m = — 1- 2_ 

k=1 


m. — 1 _Vn 

cu kn 


CU K 


7.26 (1 — z — z 2 )g'(z) = F(z) leads to = (2(n + 1 )F n + TLF n+ i ) /5 as in 
equation (7.61). 

7.27 Each oriented cycle pattern begins with f or “ or a 2 x k cycle (for 
some k ^ 2) oriented in one of two ways. Hence 


Qn — Qn-l + Qn— 2 + 2Q n _ 2 + 2Q n _3 + • • • + 2 Qq 


for n^2; Qo = Qi=1- The generating function is therefore 


Q(z) = zQ(z) + z 2 Q(z) + 2z 2 Q(z)/(1 — z) + 1 
= 1 /(l — z — z 2 — 2z 2 /(1 — z)) 
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= n-z) 

(1 — 2z — 2z 2 + z 3 ) 

(|3 2 /5 cj^> 2 / 5 2/5 

1 — 4) 2 z 1 — tf)~ 2 Z 1 + z ’ 

and Q n = (<^ 2n + 2 + cr 2n " 2 + 2(— 1 ) n )/5 = ((cT +1 - 4>^ +1 )/v / 5 ) 2 = F 2 +1 . 

7.28 In general if A(z) = (1 + z + • • • + z m_1 )B(z), we have A r + A r+m + 
A r + 2 m + ••• = B(1) for 0 / r < m. In this case m = 10 and B(z) = 
(1 +z+---+z 9 )(1 +z 2 +z 4 +z 6 + z 8 )(1 +z 5 ). 

7.29 F(z) + F(z) 2 + F(z) 3 H = z/(1 — z — z 2 — z) = (1/(1 — (1 + \fl )z) — 

( 1/(1 — (1 — \fl )z))/A so the answer is ((1 + \fl ) n — (1 — \/2 ) n )/v / 8- 

7.30 Lk=i ( 2 V-r k ) (a n b n_k /(1 - ctz) k + a n - k b T V(1 - |3z) k ), by exercise 
5.39. 

7.31 The dgf is C(z) 2 / C(z— 1 ); hence we find g(n) is the product of (k+1 — kp) 
over all prime powers p k that exactly divide n. 

7.32 We may assume that each b^ 0. A set of arithmetic progressions 
forms an exact cover if and only if 

1 z b| z bm 

1 — Z 1 — Z a 1 1 — z“ m 

Subtract z bm /(1 — z“ m ) from both sides and set z = e 2ni / a ™_ The left side 
is infinite, and the right side will be finite unless a m _i = a m . 

7.33 (— 1 ) n ~ m+1 [n>m]/(n — m). 

7.34 We can also write G n (z) = L k , +(m+ i )k m+1 =n ) (z m ) k -’ . 

In general, if 


Gn = Y. 

k] +2k 2 H hrk r =n 


kl + k2 + 
kl ,k2, . , 


we have G n = Zi G n _i + Z 2 G n _2 + • • • + z r G n _ r + [n = 0], and the generating 
function is 1/(1 — zi w — Z 2 W 2 — • • • — z r w r ). In the stated special case the 
answer is 1/(1 — w — z m w m+1 ). (See ( 5 . 74 ) for the case m = 1 .) 

7.35 (a) ^Lo<k<n( 1 A + V(n - k)) = ^H n _! . (b) [z^ln -^) 2 = 

nl [ 2 ] = ^H n _i ^7 (7-5°) an d ( 6 - 58 ). Another way to do part (b) is to 
use the rule [z 11 ] F(z) = ^-[z n_1 ] F'(z) with F(z) = (lny^)T 

7.36 
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7.37 (a) The amazing identity Q 2 n = a. 2 n+i = b n holds in the table 


n 

0 1 

2 3 

4 

5 

6 

7 

8 

9 

10 

a n 

1 1 

2 2 

4 

4 

6 

6 

10 

10 

14 

b n 

1 2 

4 6 

10 14 

20 

26 36 

46 

60 

z) = 

= 1/(0 

- z)(1 - 

- z 2 )(l - 

z 4 )(l 

-z 8 ).. 


(c) B(z) = A(z)/(1 - z 


and we want to show that A(z) = (1 + z)B(z 2 ). This follows from A(z) = 
A(z 2 )/(1 -z). 

7.38 (1 — wz)M(w,z) = n> i (min(m,n) — min(m— 1 , n— 1 ))w m z n = 

Hm n>i wTUzTl = wz/(1 — w)(l — z). In general, 


M(zi,...,z m ) = 


z, . . 


f 1 Z 1 ) • • • 0 z m ) ( 1 Zl . . . Z m ) 

7.39 The answers to the hint are 


Y a ki ak 2 • • ■ a k m and y_ a k , a k , . . . a km , 

1 <k2<”'<k m ^n 1 <)k2 ^ •••^k m <Cn 

respectively. Therefore: (a) We want the coefficient of z m in the product 
(1 + z)(l + 2 z) ... (1 + nz). This is the reflection of (z + 1 ) n , so it is [™ 4 ]] + 
[ Tl "n 1 ]z + ' ' ■ + [ n / 1 ]z n and the answer is [ ri ] 1 1 + _ 1 m ]- (b) The coefficient of z m 
in 1/((1 z) ( 1 — 2z) . . . (1 - nz)) is { m + n } by (7.47). 

7.40 The egf for (uF n -i — F n ) is (z — 1 )F(z) where F(z) = bn z V n ! = 

(e‘ i>z - e$ z )/V5. The egf for (nj) is e- z /(1 - z). The product is 

5 - 1/2 ( e ($- 1 )z_ e (c) J - 1 ) z ) = 5 — i/2 (e — <t> Z _ e -$ z ). 


We have F(z)e z = — F(— z). So the answer is (— l) n F n . 

7.41 The number of up-down permutations with the largest element n in 
position 2k is ’j ) A 2 k iA n 2 k- Similarly, the number of up-down permu- 

tations with the smallest element 1 in position 2k + 1 is ( 1 V k 1 ) A2kA n _2k-i , 
because down-up permutations and up-down permutations are equally nu- 
merous. Summing over all possibilities gives 


2A n 



A k A n _-|_ k + 2[n — 0] + [u — 1] . 


The egf A therefore satisfies 2A'(z) = A(z) 2 + 1 and A(0) = 1; the given 
function solves this differential equation. (Consequently A n = |E n | + T n is a 
secant number when n is even, a tangent number when n is odd.) 
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The empty set 
is pointless. 


7.42 Let a n be the number of Martian DNA strings that don’t end with c 
or e; let b n be the number that do. Then 


a n = 3a n _i +2b n _! + [n = 0] , 


A(z) 

A(z) 


3zA(z) + 2zB(z) + 1 , 
1 — z 

1 — 4z — z 2 


B(z) 

B(z) 


2cirL— 1 T b n _i ; 

= 2zA(z)+zB(z) 
2z 

1 — 4z — z 2 ’ 


and the total number is [z n ](l + z) / ( 1 — 4z — z 2 ) = F3 n+ 2- 

7.43 By (5.45), g n = A n G(0). The nth difference of a product can be 
written 


A n A(z)B(z) = Y_ Q(A k E T1 - k A(z))(A n - k B(z)) , 


and E n - k = (1 + A) n - k = ( n 7 k )A j . Therefore we find 
( n\ /n - k> 


H n = Y. 

j,k 


fj+k gn-k • 


This is a sum over all trinomial coefficients; it can be put into the more 
symmetric form 


Ki = Y- 

j+k+l=n 


n 

j,k,l 


fj+k 9k+l • 


7.44 Each partition into k nonempty subsets can be ordered in k! ways, so 

b k =k!. Thus Q(z) = L n , k>0 {k} k!zn /n! = Lk^ ~ = 1/(2- e*). 

And this is the geometric series X! k >o e kz /2 k+1 , hence a k = 1 /2 k+1 . Finally, 
c k = 2 k ; consider all permutations when the x’s are distinct, change each ‘>’ 
between subscripts to '<’ and allow each '<’ between subscripts to become 
either “<’ or (For example, the permutation X1X3X2 produces xi < x 3 < 
X2 and xi = X3 < X2, because 1 < 3 > 2.) 

7.45 This sum is 2In>i r (n.) / tx 2 , where r(n) is the number of ways to write 
n as a product of two relatively prime factors. If n is divisible by t distinct 
primes, r(n) = 2 t . Hence r(n)/n 2 is multiplicative and the sum is 
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7.46 Let S n = Lo<:k$n /2 T k 2k ) ak - Then = s n-i + aSn-3 + [tl = 0], 
and the generating function is 1/(1 — z — az 3 ). When a = — jy, the hint 
tells us that this has a nice factorization 1/(1 ± |z)(1 — |z) 2 . The general 
expansion theorem now tells us that S n = (|n + c)(|) n + ^(— |) n , and the 
remaining constant c turns out to be 

7.47 The Stern-Brocot representation of V3 is R(LR 2 )°°, because 


V3 + 1 = 2 + 


1 + 


V5 + 1 


The fractions are j, 2 , 3 , |, 3 , |2, they eventually have the 

cyclic pattern 


V2n-1+V2n+1 bhn+V/n+l U2 n+ 2±V2n-l V2 n +1 +V2 n +3 
U2n ’ V/n+l ’ U2n+V2n + 1 ’ U2n+2 


7.48 We have go = 0, and if gi = m the generating function satisfies 

qG(z) + bz -1 G(z) + cz~ 2 (G(z) — mz) + = 0. 

Hence G(z) = P(z)/(az 2 + bz + c)(1 — z) for some polynomial P(z). Let pi 
and P 2 be the roots of cz 2 ± bz ± a, with | p 1 1 ^ | P 2 1 - If b 2 — 4ac ^ 0 then 
IPil 2 = P 1 P 2 = a/c is rational, contradicting the fact that ^/gi/ approaches 
1 + \fl. Hence pi = (— b + \J b 2 — 4ca)/2c = 1 + \/2\ and this implies that 
a = — c, b = —2c, P 2 = 1 — \/2- The generating function now takes the form 


G(z) 


z(m — (r + m)z) 

(1 — 2z — z 2 ) ( 1 — z) 


— r + (2m + r)z r 
2(1 — 2z — z 2 ) + 2(1 -z) 


mz + (2m — r)z 2 H , 


where r = d/c. Since g 2 is an integer, r is an integer. We also have 

g n = «(1 +V2) n + &(1 -V2) n + lr = [a^+V2) n \, 

and this can hold only if r = —1, because (1 — V2) n alternates in sign as 
it approaches zero. Hence (a,b,c,d) = ±(1,2,— 1,1). Now we find a = 
^(1 + \/2 m), which is between 0 and 1 only if 0 ^ m ^ 2. Each of 
these values actually gives a solution; the sequences (g n ) are (0, 0, 1 , 3, 8, . . . ), 
(0,1,3,8,20,...), and (0,2,5,13,32,...). 
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7.49 (a) The denominator of (1/(1 — (1 + \/2)z) + 1/(1 — (1 — \/2)z)) is 

1 — 2z — z 2 ; hence a n = 2a n _i + a n _2 for tl A 2. (b) True because a n is even 
and — 1 < 1 yfl < 0. (c) Let 


bn = 


p + \/q 


■V* 


We would like b n to be odd for all n > 0, and —1 < (p — yTf )/2 < 0. Working 
as in part (a), we find bo = 2, b] = p, and b n = pb n _i + b(q — p 2 )b n _2 for 
n ^ 2. One satisfactory solution has p = 3 and q = 1 7. 

7.50 Extending the multiplication idea of exercise 22, we have 

Q = -+ Q A Q + qAq +q Q OV-- 

Replace each ri-gon by z n ~ 2 . This substitution behaves properly under mul- 
tiplication, because the pasting operation takes an m-gon and an n-gon into 
an (m + n — 2)-gon. Thus the generating function is 

Q = 1 + zQ 2 + z 2 Q 3 + z 3 Q 4 + --- = 1+ ^Q_ 

1 — zQ 


Give me Legen- 
dre polynomials 
and I’ll give you a 
closed form. 


and the quadratic formula gives Q = (1 + z — v 7 1 — 6z + z 2 ) /4z. The coeffi- 
cient of z n ~ 2 in this power series is the number of ways to put nonoverlapping 
diagonals into a convex n-gon. These coefficients apparently have no closed 
form in terms of other quantities that we have discussed in this book, but 
their asymptotic behavior is known [207, exercise 2.2.1-12]. 

Incidentally, if each n-gon in Q is replaced by wz n ~ 2 we get 


Q 


1 + z — y / 1 — (4 w + 2)z + z 2 
2(1 + w)z 


a formula in which the coefficient of w m z n 2 is the number of ways to divide 
an n-gon into m polygons by nonintersecting diagonals. 

7.51 The key first step is to observe that the square of the number of ways 
is the number of cycle patterns of a certain kind, generalizing exercise 27. 
These can be enumerated by evaluating the determinant of a matrix whose 
eigenvalues are not difficult to determine. When m = 3 and n = 4, the fact 
that cos 36° = 4>/2 is helpful (exercise 6.46). 

7.52 The first few cases are p 0 (y) = 1, Pi(y) = y, Pzty) = y 2 + y, 
P 3 (y) = y 3 + 3y 2 + 3y. Let p n (y) = 92 n(x) where y = x(1 - x); we 
seek a generating function that defines q 2 n+i (x) in a convenient way. One 
such function is ]T n q n (x)z n /n! = 2e ucz /(e lz + 1), from which it follows 
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that q n (x) = i n E n (x), where E n (x) is called an Euler polynomial. We have 
][_(— 1 ) x x n 6 x = j [— 1 ) x+1 E n (x), so Euler polynomials are analogous to Ber- 
noulli polynomials, and they have factors analogous to those in ( 6 . 98 ). By 
exercise 6.23 we have uE n _i (x) = Hk=o ( k )B k x n ~ k (2 — 2 k+1 ); this polyno- 
mial has integer coefficients by exercise 6.54. Hence c| 2 nM, whose coefficients 
have denominators that are powers of 2, must have integer coefficients. Hence 
p n (y) has integer coefficients. Finally, the relation (4y — 1 )p " (y ) + 2p(Jy) — 
2 n( 2 n — 1 )p n -i (y ) shows that 


2 m( 2 m — 1 


= m(m + 1 


n 

m + 1 


■ 2 n( 2 n - 1 ; 


n- 1 
m — 1 


and it follows that the |^|’s are positive. (A similar proof shows that the 
related quantity (—1 ) n (2n + 2 )E 2 n +i (x)/(2x — 1 ) has positive integer coeffi- 
cients, when expressed as an nth degree polynomial in y.) It can be shown 
that |™| is the Genocchi number (— 1 ) n_1 (2 2n+1 — 2)B2 n (see exercise 6.24), 
“dthat | n 2,| = ©, | n " 2 | =2( n t')+3(J),etc. 

7.53 It is P( 1+ v 4n+1+ v 4n+ 3)/6- Thus, for example, T 20 = P 12 = 210; T 28 5 = 
Pi 65 =40755. 

7.54 Let E k be the operation on power series that sets all coefficients to zero 
except those of z n where n mod m = k. The stated construction is equivalent 
to the operation 


E 0 SE 0 S(Eo + Ei)S ... S(E 0 + Ei +--. + E m _i) 


applied to 1/(1 — z), where S means “multiply by 1/(1 — z).” There are m! 
terms 


E 0 SE k , SE k2 S ... SE km 

where 0 ^ kj < j, and every such term evaluates to z rm /(l — z m ) m+1 if r is 
the number of places where kj < kj + i . Exactly (’))) terms have a given value 
of r, so the coefficient of z mn is (™}( n+ m ~ T ) = ( n + l) m by (6.37). 

(The fact that operation E k can be expressed with complex roots of unity 
seems to be of no help in this problem.) 

7.55 Suppose that Po(z)F(z) + ■ ■ • + P m (z)F (m) (z) = Qo(z)G(z) + ■ ■ ■ + 
Q n (z) G (n * (z) = 0, where P m (z) and Q n (z) are nonzero, (a) Let H(z) = F(z) + 
G(z). Then there are rational functions R k j(z) for 0 ^ l < m + ri such that 
H (k) (z) = R k , 0 (z) F(°) (z) + ■ ■ ■ + R k , m _! (z) F (m_1 1 (z) + R k , m (z) G ( °) (z) + ■ ■ ■ + 
Rk,m+n-i (z)G (n_1) ( Z ). Them + n+1 vectors (R k>0 (z), • • • , Rk, m +n-i (z)) 
are linearly dependent in the (m + n)-dimensional vector space whose com- 
ponents are rational functions; hence there are rational functions Si(z), not 
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all zero, such that So(z)H , 0 *(z) + • • • + S m+n (z)H ,m+n *(z) = 0. (b) Simi- 
larly, let H(z) = F(z)G(z). There are rational R^jfz) for 0 ^ l < mn with 

H (k) (z) = L'o 1 R k,ni+j( z ) Fli) (z)G (i) (z), hence S 0 (z)H ( 0 ) (z) + h 

S m n( z )H ,mn '(z) = 0 for some rational St(z), not all zero. (A similar proof 
shows that if (f n ) and (g n ) are polynomially recursive, so are (f n + g n ) and 
(fngn)- Incidentally, there is no similar result for quotients; for example, cos z 
is differentiably finite, but 1 /cosz is not.) 

7.56 Euler [113] showed that this number is also [z n ] 1/Vi — 2z— 3z 2 , and he 
gave the formula t n = ^ k>0 n— /k ! 2 = ^ k ( k )( n k k )- He also discovered a 
“memorable failure of induction” while examining these numbers: Although 
3t n — trt+i is equal to F n _i (F n . ] + 1 ) for 0 ^ n ^ 8 , this empirical law 
mysteriously breaks down when n is 9 or more! George Andrews [12] has 
explained the mystery by showing that the sum 2 I k [z n+10k ] (1 + z + z 2 ) n can 
be expressed as a closed form in terms of Fibonacci numbers. 

H. S. Wilf observes that [z n ] (a + bz+ cz 2 ) n = [z n ] l/f(z), where f(z) = 

\J 1 — 2 bz + (b 2 — 4ac)z 2 (see [373, page 159]), and it follows that the coeffi- 
cients satisfy 

(n + 1 )A n+ i - (2n -i- 1 )b A n + n(b 2 - 4ac)A n _! = 0 . 

The algorithm of Petkovsek [291] can be used to prove that this recurrence has 
a closed form solution as a finite sum of hypergeometric terms if and only if 
abc(b 2 — 4ac) = 0. Therefore in particular, the middle trinomial coefficients 
have no such closed form. The next step is presumably to extend this result 
to a larger class of closed forms (including harmonic numbers and /or Stirling 
numbers, for example). 

7.57 (Paul Erdos currently offers $500 for a solution.) 

8.1 24 + 48 + 4 k + T 8 + 4 k +24 = S- 0 n f act ’ we a ^ wa y s g et doubles 
with probability A when at least one of the dice is fair.) Any two faces whose 
sum is 7 have the same probability in distribution Pri , so S = 7 has the same 
probability as doubles. 

8.2 There are 12 ways to specify the top and bottom cards and 50! ways 

to arrange the others; so the probability is 12-50 !/52! = 12/(51-52) = = 

l 

221 • 

8.3 yq( 3 + 2 + -- - + 9 + 2) = 4.8; 1(3 2 +2 2 + - • • + 9 2 +2 2 - 10(4.8) 2 ) = 
which is approximately 8 . 6 . The true mean and variance with a fair coin are 
6 and 22, so Stanford had an unusually heads-up class. The corresponding 
Princeton figures are 6.4 and « 12.5. (This distribution has K 4 = 2974, 
which is rather large. Hence the standard deviation of this variance estimate 
when n = 10 is also rather large, ^/2974/10 + 2(22) 2 /9 ss 20.1 according to 
exercise 54. One cannot complain that the students cheated.) 
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8.4 This follows from (8.38) and (8.39), because F(z) = G(z)H(z). (A 
similar formula holds for all the cumulants, even though F(z) and G(z) may 
have negative coefficients.) 

8.5 Replace H by p and T by q = 1 — p. If Sa = Sb = \ we have p 2 qN = j 
and pq 2 N = lq + 1; the solution is p = 1 /4> 2 , q = 1 /4>. 

8.6 In this case X|p has the same distribution as X, for all y, hence 
E(X|Y) = EX is constant and V(E(X|Y)) = 0. Also V(X|Y) is constant and 
equal to its expected value. 

8.7 We have 1 = (pi +P2H Fpe) 2 ^ 6(P 2 +Pjz + ' • -+Pg) by Chebyshev’s 

monotonic inequality of Chapter 2. 

8.8 Let p = Prfcue AC B), q = Pr(co^A), and r = Pr(cu^B). Then 
p + q + r = 1 , and the identity to be proved is p = (p + r) (p + q) — qr. 

8.9 This is true (subject to the obvious proviso that F and G are defined 
on the respective ranges of X and Y), because 

Pr(F(X) = f and G(Y) = g) = Pr(X = x and Y = q) 

xeF-' (f) 
y6G~' (g) 

= Y_ Pr(X = x) -Pr(Y = y) 

xSF-'(f) 
y6G~' (g) 

= Pr(F(X) =f) • Pr(G(y) = g) . 

8.10 Two. Let xi < xz be medians; then 1 Y Pr(X^xi) + Pr(X^X2) Y- 
1, hence equality holds. (Some discrete distributions have no median ele- 
ments. For example, let D. be the set of all fractions of the form ±1 /n, with 
Pr(+l/n) = Pr(— 1/n) = f^n- 2 .) 

8.11 For example, let K = k with probability 4/(k+ 1 )(k + 2)(k + 3), for all 
integers k ^ 0. Then EK = 1, but E(K 2 ) = 00. (Similarly we can construct 
random variables with finite cumulants through K m but with K m+ i = 00.) 

8.12 (a) Let p^ = Pr(X = k). If 0 < x 1, we have Pr(X^r) = k<r Pk Y: 

L k$r x k -p k Y T pic = x r P(x). The other inequality has a similar 

proof, (b) Let x = a/(l — a) to minimize the right-hand side. (A more precise 
estimate for the given sum is obtained in exercise 9.42.) 

8.13 (Solution by Boris Pittel.) Let us set Y = (Xi + • • • + X n )/n and 
X = (X n+1 + • • ■ + X2n)/FL. Then 
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$; Pr 


Y — a 


Z — a 


< Y — a 


= Pr (|Z — <x| C-. Y-a) ^ ( . 


The last inequality is, in fact, ‘>’ in any discrete probability distribution, 
because Pr(Y = Z) >0. 

8.14 Mean(H) = p Mean(F) + q Mean(G); Var(H) = p Var(F) + q Var(G) + 
pq (Mean( F) — Mean( G)) 2 . (A mixture is actually a special case of conditional 
probabilities: Let Y be the coin, let X|H be generated by F(z), and let X|T 
be generated by G(z). Then VX = EV(X|Y) + VE(X|Y), where EV(X|Y) = 
pV(X|H) + qV(X|T) and VE(X|Y) is the variance of pz Mean,F) + qz Me “(G).) 

8.15 By the chain rule, H'(z) = G'(z)F'(G(z)); H"(z) = G"(z)F'(G(z)) + 
G'(z) 2 F"(G(z)). Hence 


Mean(H) = Mean(F) Mean(G) ; 

Var(H) = Var(F) Mean(G) 2 + MeanfF) Var(G) . 


(The random variable corresponding to probability distribution H can be un- 
derstood as follows: Determine a nonnegative integer n by distribution F; 
then add the values of n independent random variables that have distribu- 
tion G. The identity for variance in this exercise is a special case of (8.106), 
when X has distribution H and Y has distribution F.) 

8.16 e w(z - 1) /n -w). 

8.17 Pr(Y np ^m) = Pr(Y np +n^m + n) = probability that we need ^ 
m + n tosses to obtain n heads = probability that m + n tosses yield >. n 
heads = Pr(X m+np ^n). Thus 



and this is (5.19) with n = r, x = q, y = p. 

8.18 (a) Gx(z) = (b) The mth cumulant is p, for all m ^ 1. (The 

case p = 1 is called F^ in (8.55).) 

8.19 (a) Gx,+x 2 ( z ) — Gx, (z)Gx 2 (z) = Hence the proba- 

bility is e~ |Z1 ~ vl2 (pi + p2) n / nF the sum of independent Poisson variables is 
Poisson, (b) In general, if K m X denotes the mth cumulant of a random vari- 
able X, we have K m (aXi +bX2) = a m (K m Xi) + b m (K m X2), when a,b ^ 0. 
Hence the answer is 2 m pi + 3 m p2- 
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8.20 The general pgf will be G(z) = z m /F(z), where 


F(z) = z m + (l-z)^A (k) [Al k )=A (k) ]z m - k , 

k=1 

m 

F'(l) = m-^A (k) [A< k )=A (k) ], 

k=1 

m 

F"(l) = m(m — 1 ) — 2 ^ (m — k)A( k) [A ,k) = A (k) ] . 

k=1 


8.21 This is ]T n>0 w here q n is the probability that the game between 
Alice and Bill is still incomplete after n flips. Let p n be the probability that 
the game ends at the nth flip; then p n + 4n = On-i • Hence the average time 

to play the game is n Pn = (Po — qi) +2(qi - c\ 2 )+3(q 2 - q3 ) H = 

q 0 + qi + q2 H = N, since lim^oo nq n = 0. 

Another way to establish this answer is to replace H and T by jZ. 
Then the derivative of the first equation in (8.78) tells us that N (1 ) + N '(1 ) = 

By the way, N = ' 3 6 . 

8.22 By definition we have V(X | Y) = E(X 2 |Y)~ (F(X|Y)) 2 and V(E(X|Y)) = 
E((E(X|Y)) 2 ) - (E(E(X|Y))) 2 ; hence E(V(X|Y)) + V(E(X|Y)) = E(E(X 2 |Y)) - 
(E(E(X|Y))) 2 . But E(E(X|Y)) = Pr(Y = y)E(x|y) = y Pr(Y=y) x 
Pr((X|y) =x) = EX and E(E(X 2 |Y)) = E(X 2 ), so the result is just VX. 

8.23 Let Qo ={Q> EH } 2 and = { EH- 0- ED- 0 Pi and iet ^2 be the 
other 16 elements of Q. Then Pr^ (cu) — Proo(tu) = ty§, gyg, tjc according 
as cu £ Q.q, £1] , Cl 2 . The events A must therefore be chosen with kj elements 
from Oj, where (ko, ki , \c 2 ) is one of the following: (0,0,0), (0, 2, 7), (0,4, 14), 
(1,4,4), (1,6,11), (2,6,1), (2,8,8), (2,10,15), (3,10,5), (3,12,12), (4,12,2), 
(4, 14, 9), (4, 1 6, 1 6). For example, there are ( 2 ) ( 1 g 6 ) (\ 6 ) events of type (2, 6, 1 ). 
The total number of such events is [z°](l +z 20 ) 4 (1 +z~ 7 ) 16 (1 +z 2 ) 16 , which 
turns out to be 1304872090. If we restrict ourselves to events that depend 
on S only, we get 40 solutions S € A, where A = 0, { 2 2 , 4 0 , |}, { 2 2 ,5,9}, 
{2, 12, 4 0 , g ,5, 9}, {2,4, 6, 8, 10, 12}, { 0 ,7, f , 4, 1 0}, and the complements of 
these sets. (Here the notation ‘ J 2 , ’ means either 2 or 12 but not both.) 

8.24 (a) Any one of the dice ends up in J’s possession with probability 
p = i + (f ) 2 P; hence p = yp. Let q = -fj. Then the pgf for J’s total holdings 
is (q +pz) 2n+1 , with mean (In + 1 )p and variance (In + l)pq, by (8.61). 

GO (>V + (> 4 q + ©P 5 = » * 585. 
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This problem can 
perhaps be solved 
more easily without 
generating functions 
than with them. 


8.25 The pgf for the current stake after n rolls is G n (z), where 


Go(z) = z A ; 

G n (z) = G n _i(z 2(lc “ 1)/5 )/6, for n > 0. 


(The noninteger exponents cause no trouble.) It follows that Mean(G n ) = 
Mean(G n _i ), and Var(G n ) + Mean(G n ) 2 = y|(Var(G n _i ) + Mean(G n _i ) 2 ). 
So the mean is always A, but the variance grows to ((f§) n_ 1)A 2 . 

8.26 The pgf F^ n (z) satisfies F[ n (z) = F^ rL _^ (z) /l; hence Mean(Fp n ) = 
F( n (1) = [n^l]/l and F{' n (1) = [n ^21] /l 2 ; the variance is easily computed. 
(In fact, we have 


Fl,n(z) 


L 


i 

id 



which approaches a Poisson distribution with mean 1 /l as n — > oo.) 

8.27 (n 2 L3 — 3nl2^i + 2L 3 )/h(h — 1 ) (n — 2) has the desired mean, where 

£k = X^ + • • ■ + This follows from the identities 


EI 3 = np 3 ; 

= np 3 +n(n- l)p2Pi ; 

E(I 3 ) = np 3 + 3n(n- 1)p 2 m + n(n - 1 )(n - 2)pf . 

Incidentally, the third cumulant is k 3 = E((X— EX) 3 ), but the fourth cumulant 
does not have such a simple expression; we have K4 = E((X — EX) 4 ) — 3(VX) 2 . 

8.28 (The exercise implicitly calls for p = q = 7, but the general answer is 
given here for completeness.) Replace H by pz and T by qz, getting Sa(z) = 
p 2 qz 3 /(l — pz)(l — qz)(l — pqz 2 ) and Sb (z) = pq 2 z 3 /(l — qz)(l — pqz 2 ). The 
pgf for the conditional probability that Alice wins at the nth flip, given that 
she wins the game, is 

Sa(z) = z3 _ q p_ 1 -pq 

Sa(1) 1 — pz 1 — qz 1 — pqz 2 ' 

This is a product of pseudo-pgf’s, whose mean is 3+p/q + q/p+2pq/(1 — pq). 
The formulas for Bill are the same but without the factor q/(l —pz), so Bill’s 
mean is 3 + q/p + 2pq/( 1 — pq). When p = q = the answer in case (a) is 
i?; in case (b) it is ip. Bill wins only half as often, but when he does win he 
tends to win sooner. The overall average number of flips is = 

agreeing with exercise 21. The solitaire game for each pattern has a waiting 
time of 8. 
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8.29 Set H = T = \ in 

1 +N(H + T) = N + Sa + Sb+Sc 

NHHTH = S a (HTH+ 1) + S b (HTH + TH) +S c (HTH + TH) 

NHTHH = Sa(THH + H) + Sb(THH+ 1) + Sc(THH) 

NTHHH = S a (HH) + S b (H) + S c 

to get the winning probabilities. In general we will have Sa + S B + Sq = 1 
and 


S a (A:A)+Sb(B:A) + S c (C:A) = S a (A:B) + S b (B:B) + S c (C:B) 

= S a (A:B) + Sb(B:C) + S c (C:C). 

In particular, the equations 9 Sa + 3S B + 3Sc = 5 Sa + 9S B + Sc = 2 Sa + 
4S b + 8S C imply that S A = = 53, S c = 53. 

8.30 The variance of P(hi , . . . , h n ; k) |k is the variance of the shifted bino- 
mial distribution ((m — 1 +z)/m) k_1 z, which is (k— 1 )( A)(1 — -E) by (8.61). 
Hence the average of the variance is Mean(S)(m— l)/m 2 . The variance of 
the average is the variance of (k— 1)/m, namely Var(S)/m 2 . According to 
(8.106), the sum of these two quantities should be VP, and it is. Indeed, we 
have just replayed the derivation of (8.96) in slight disguise. (See exercise 15.) 

8.31 (a) A brute force solution would set up five equations in five unknowns: 

A = ^zB + jzE ; B = jzC ; C = 1 + jzB + jzD ; 

D = jzC+jzE; E = jzD . 

But positions C and D are equidistant from the goal, as are B and E, so we 
can lump them together. If X = B + E and Y = C + D, there are now three 
equations: 

A = jzX ; X = jzY ; Y = 1+±zX+±zY. 

Hence A = z 2 /(4 — 2z — z 2 ); we have Mean(A) = 6 and Var(A) = 22. (Rings 
a bell? In fact, this problem is equivalent to flipping a fair coin until get- 
ting heads twice in a row: Heads means “advance toward the apple” and 
tails means “go back.”) (b) Chebyshev’s inequality says that Pr(S^lOO) = 
Pr((S — 6) 2 ^ 94 2 ) 22/94 2 ss .0025. (c) The second tail inequality says 

that Pr(S ^ 100) V 1 /x 98 (4 — 2x — x 2 ) for all x 5s I, and we get the upper 
bound 0.00000005 when x = (V49001 — 99) /1 00. (The actual probability is 
approximately 0.0000000009, according to exercise 37.) 
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“Toto, I have a 
feeling we’re not in 
Kansas anymore.” 

— Dorothy 


8.32 By symmetry, we can reduce each month’s situation to one of four 
possibilities: 

D, the states are diagonally opposite; 

A, the states are adjacent and not Kansas; 

K, the states are Kansas and one other; 

S, the states are the same. 


Considering the Markovian transitions, we get four equations 


D — 1 + z(^D + 

A = z(|A + ^K) 

K = Z (|D + |A+^K) 
S = z(|D + lA+^K) 


whose sum isD + K + A + S = l+ z(D + A + K). The solution is 

81 z — 45 z 2 — 4z 3 
243 - 243z + 24z 2 + 8z 3 ’ 

but the simplest way to find the mean and variance may be to write z = 1 + w 
and expand in powers of w, ignoring multiples of w 2 : 


D 

A 

K 


— 27 , 1593 . 

— 1C T c:i 1 * 


Now S'(l) = 
The mean is y| 


f + 


15 

8 


75 

16’ 


and j S " ( 1 ) 


1593 , 21 15 , 2661 
512 “ r 256 256 


and the variance is . (Is there a simpler way?) 


1 1 145 
512 • 


8.33 First answer: Clearly yes, because the hash values h-i , . . . , h n are 
independent. Second answer: Certainly no, even though the hash values hi , 
. . . , h n are independent. We have Pr (Xj = 0) = )T k- =1 s k ([j ^k](m-1)/m) = 
(1 — Sj)(m — 1 )/m, but Pr(X] =Xi =0) = X! k =i s k [k>2](m — 1 ) 2 /m 2 = 
(1 — si — S 2 )(m— l) 2 /m 2 ^ Pr(X] =0) Pr(X 2 =0). 


8.34 Let [z n ] S m (z) be the probability that Gina has advanced < m steps 
after taking n turns. Then S m (1) is her average score on a par-m hole; 
[z m ] S m (z) is the probability that she loses such a hole against a steady player; 
and 1 — [z m_1 ] S m (z) is the probability that she wins it. We have the recur- 
rence 


So(z) = 0; 

S m (z) = (1 +pzS m _ 2 (z) + qzS m _i(z))/(1 -rz) , for in > 0. 
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To solve part (a), it suffices to compute the coefficients for m, n ^ 4; it is 
convenient to replace z by lOOw so that the computations involve nothing 
but integers. We obtain the following tableau of coefficients: 


So 

0 

0 

0 

0 

0 

s, 

1 

4 

16 

64 

256 

s 2 

1 

95 

744 

4432 

23552 

s 3 

1 

100 

9065 

104044 

819808 

s 4 

1 

100 

9975 

868535 

12964304 


Therefore Gina wins with probability 1 — .868535 = .131465; she loses with 
probability .12964304. (b) To find the mean number of strokes, we compute 


Sid) = if; S 2 (l) 


4675 . 
2304 » 


s 3 (i) 


667825 . 
221184 > 


S 4 (D 


85134475 
2 1233664 ' 


(Incidentally, S 5 ( 1 ) « 4.9995; she wins with respect to both holes and strokes 
on a par-5 hole, but loses either way when par is 3.) 


8.35 The condition will be true for all n if and only if it is true for n = 1, 
by the Chinese remainder theorem. One necessary and sufficient condition is 
the polynomial identity 


(P2+P4+P6 + (Pi +P3+P5)w) (p 3 +P 6 + (P 1 +P 4 )z + (p 2 +p 5 )z 2 ) 
= (PlWZ+ p 2 z 2 +p 3 w +p 4 z+ p 5 wz 2 + p 6 ) , 


but that just more-or-less restates the problem. A simpler characterization is 


(P 2 +P4 + Pe)(P3 +Pe) = P 6 , (Pi +P3 +P5)(P2 +P5) = P5 , 


which checks only two of the coefficients in the former product. The general 
solution has three degrees of freedom: Let do + ai = bo + bi + b 2 = 1, and 
put p! = ai bi , p 2 = a 0 b 2 , p 3 = ai b 0 , P4 = a 0 bi , p 5 = ai b 2 , p 6 = a 0 b 0 . 

8.36 (a) [cj [G] [yj [y] [HI . (b) If the kth die has faces with 

Si , . . . , Sg spots, let pic(z) = z S| + • • • + z S6 . We want to find such poly- 
nomials with pi (z) . . . p n (z) = (z + z 2 + z 3 + z 4 + z 5 + z 6 ) n . The irre- 
ducible factors of this polynomial with rational coefficients are z n (z+ 1) n x 
(z 2 + z + 1 ) n (z 2 — z + 1 ) n ; hence pk(z) must be of the form z ak (z + 1 ) bk x 
(z 2 + z + 1 ) Ck (z 2 — z + I ) dk . We must have dk 1 , since Pk(0) = 0; and in 
fact dk = 1 , since di + • ■ ■ + d n = n. Furthermore the condition Pk(l ) = 6 
implies that bk = Ck = 1 . It is now easy to see that 0 ^ dk ^ 2, since 
dk > 2 gives negative coefficients. When d = 0 and d = 2, we get the two 
dice in part (a); therefore the only solutions have k pairs of dice as in (a), 
plus n — 2k ordinary dice, for some k <: jn. 
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8.37 The number of coin-toss sequences of length n is F n _i , for all n > 0, 
because of the relation between domino tilings and coin flips. Therefore the 
probability that exactly n tosses are needed is F n _i / 2 n , when the coin is fair. 
Also q n = F n+ i /2 n_1 , since H k 5 ;Tt F k z k = (F n z n + F n _i z n+1 )/(l -z-z 2 ). 
(A systematic solution via generating functions is, of course, also possible.) 

8.38 When k faces have been seen, the task of rolling a new one is equivalent 

to flipping coins with success probability p k = (m — k)/m. Hence the pgf is 
nic=o Pk z /0 “ Pkz) = nl=o(m - k)z/(m - kz). The mean is )T k J 0 p " 1 = 
m(H m - H m _x); the variance is m 2 (Hm - - m(H m - H m _i); and 

equation ( 7 . 47 ) provides a closed form for the requested probability, namely 
m~ n m!{' ] l 2 1 1 }/(m— l)!. (The problem discussed in this exercise is tradition- 
ally called “coupon collecting.”) 

8.39 E(X) = P(— 1 ); V(X) = P(— 2) - P(-l ) 2 ; E(lnX) = -P'(0). 

8.40 (a) We have K m =n(0!{’7}p - 1 !{^}p 2 + 2!{^}p 3 ), by ( 7 . 49 ). 

Incidentally, the third cumulant is np q ( q — p ) and the fourth is up q ( 1 — 6p q ) . 
The identity q+pe t = (p+qe^je* shows that f m (p) = (—1 ) m f m (q) + [ru= 1]; 
hence we can write f m (p) = g m (pq)(q — p)^ m odd ^, where g m is a polynomial 
of degree [tu/2J, whenever m > 1. (b) Let p = j and F(t) = ln(j + ^-e 4 ). 
Then m>1 K m t m ~y(m— 1 )! = F'(t) = 1 — l/(e t + l ), and we can use exercise 
6.23. 

8.41 If G(z) is the pgf for a random variable X that assumes only positive 
integer values, then jJ, G(z) dz/z = £Lk>i Pr(X = k)/k = E(X _1 ). If X is the 
distribution of the number of flips to obtain n + 1 heads, we have G(z) = 
(pz /(1 — qz)) T1+ by ( 8 . 59 ), and the integral is 

-1 / pz \ n+1 dz w n dw 

. 0 V 1 -qz/ z J 0 1 + (q/p)w 

if we substitute w = pz/(1 — qz). When p = q the integrand can be written 
(-1 ) n ((1 -Hw )- 1 -1 +w-w 2 + - • - + (-l )V-' ), so the integral is (-1 ) n (in 2 - 

1 + j — yH F(— l) n / n )- We have Han. — Fl n = ln 2 — + ygn ~ 2 + 0 (n~ 4 ) 

by ( 9 . 28 ), and it follows that EfX^^, ) = 2 n 1 ~~ + 0 (n~ 4 ). 

8.42 Let F n (z) and G n (z) be pgf’s for the number of employed evenings, if 
the man is initially unemployed or employed, respectively. Let qn = 1 — Ph 
and qf = 1 — Pf. Then Fo(z) = Go(z) = 1, and 

F n (z) = PhzGn-i (z) + q h F n -i (z) ; 

G n (z) = p f F n _! (z) + q f zG n _i (z) . 
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The solution is given by the super generating function 

G(w,z) = ^ G n (z)w n = A(w)/ (1 — zB(w)) , 

u^O 

where B(w) = w(qf — (qf — Ph.)w)/(1 — q^w) and A(w) = (1 — B(w))/(1 — w). 
Now Inj o Gnl 1 )^ = aw/(1 — w ) 2 + (3/(1 -w)-| 3/(1 -(q f -p h )w) where 

Pk n Pf(qf-Ph) 

a = , 3 = rr - ; 

Ph + Pf (Ph+Pf ) 2 

hence G(J1 ) = an + (3 ( 1 — (qf — PhJ n )- (Similarly G"(l) = a 2 n 2 + O(n), so 
the variance is O(n).) 

8.43 G n (z) = X k >o [k] zk / n ! = z n / n !i by ( 6 . 11 ). This is a product of 
binomial pgf ’s, ]~[k=i ((k— 1 +z) /k) , where the kth has mean 1 /k and variance 
(k— 1)/k 2 ; hence Mean(G n ) = H n and Var(Gn.) = H n — H|r'. 

8.44 (a) The champion must be undefeated in n rounds, so the answer 
is p n . (b,c) Players xi , . . . , X 2 k must be “seeded” (by chance) in distinct 
subtournaments and they must win all 2 k (n — k) of their matches. The 2 n 
leaves of the tournament tree can be filled in 2 n ! ways; to seed it we have 
2 k !( 2 n ~ k ) 2 ways to place the top 2 k players, and ( 2 n — 2 k )! ways to place 
the others. Hence the probability is (2p ) 2 ( n-k )/( 2 k ). When k = 1 this 
simplifies to (2p 2 ) n ~V(2 n — 1). (d) Each tournament outcome corresponds 
to a permutation of the players: Let yi be the champ; let yz be the other 
finalist; let y 3 and y 4 be the players who lost to y i and y 2 in the semifinals; let 
(ys, . . . ,ys) be those who lost respectively to (yi , . . . , y 4 ) in the quarterfinals; 
etc. (Another proof shows that the first round has 2 rl !/2 Tl ~ 1 ! essentially 
different outcomes; the second round has 2 n_1 !/2 n ~ 2 !; and so on.) (e) Let Sk 
be the set of 2 k_1 potential opponents of X 2 in the kth round. The conditional 
probability that xz wins, given that X] belongs to Sk, is 

Pr(xi plays X 2 ) -p ™ -1 ( 1 — p ) + Pr(xi doesn’t play X 2 ) -p n 
= p^p^d-p) + (l-p^p*. 

The chance that xi € Sk is 2 k ~y(2 n — 1 ); summing on k gives the answer: 

Z^=T( pk_1pn_1(1_p) + (1 - p,t " 1) P n ) = ^ J 2 V 2 T_^ v n -^ 

k=1 Z 

(f) Each of the 2 n ! tournament outcomes has a certain probability of occur- 
ring, and the probability that Xj wins is the sum of these probabilities over 
all (2 n — 1 ) ! tournament outcomes in which Xj is champion. Consider inter- 
changing Xj with Xj + i in all those outcomes; this change doesn’t affect the 
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probability if Xj and Xj + i never meet, but it multiplies the probability by 
(1 — p)/p < 1 if they do meet. 

8.45 (a) A(z) = 1/(3 — 2z); B(z) = zA(z) 2 ; C(z) = z 2 A(z) 3 . The pgf for 

sherry when it’s bottled is z 3 A(z) 3 , which is z 3 times a negative binomial 
distribution with parameters n = 3, p = -j. (b) Mean(A) = 2, Var(A) = 6; 
Mean(B) = 5, Var(B) = 2Var(A) = 12; Mean(C) = 8, Var(C) = 18. The 
sherry is nine years old, on the average. The fraction that’s 25 years old is 
(/|) (— 2) 22 3~ 25 = ( 2 2)2 22 3~ 25 = 23-(|) 24 ss .00137. (c) Let the coefficient 
of w n be the pgf for the beginning of year n. Then 


A = (1 + jw/(l - w))/(l - |zw) ; 
B = (1 + lzwA)/(l - fzw) ; 

C = (l + jZwB)/(l — |zw) . 


Differentiate with respect to z and set z = 1 ; this makes 

C ' = - V2 _ 3/2 _ 6 

1 — w (1 — |w) 3 (1 — |w) 2 1 — §w 

The average age of bottled sherry n years after the process started is 1 greater 
than the coefficient of w n_1 , namely 9— (|) n (3n 2 +21n+72)/8. (This already 
exceeds 8 when n = 11.) 

8.46 (a) P(w, z) = 1 + j (wP(w, z) + zP(w, z)) = (l — j(yv + z)) -1 , hence 

p mn =2- m — ( m + n ). (b) P k (w,z) = l(w k +z k )P( W ,z); hence 

m T k ))' 

(C) k2 k ~ 2n ( 2l p k ) = LJ, 0 (n - this can 

be summed using (5.20): 


Pk,m,n — 2 


k 1 — m — n 


m + n — k 
m 


+ 


Y_ 2~ n ~ k ( (2n + 1 


k=0 


n + k 


n 


- n- 


= (2n + 1 ) - (n + 1 )2~ n ( 2 n+1 

_ 2n + 1 fin 
2 2n 1 n 


-2 


n + 1 + k 
n + 1 

In + 1 


-n-l 


n - 


1 


(The methods of Chapter 9 show that this is 1^/n/n — 1 + 0(n 1 ^ 2 ).) 

8.47 After n irradiations there are n + 2 equally likely receptors. Let the 
random variable X n denote the number of diphages present; then X^+i = 
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X n + Y n , where Y n = — 1 if the (n + 1)st particle hits a diphage receptor 
(conditional probability 2X n /(n + 2)) and = +2 otherwise. Hence 

EX n+1 = EX n + EY n = EX n — 2EX n /(n+2) + 2(1 — 2EX n /(n+2)) . 

The recurrence (n+2)EX n+ i = (n— 4) EX n +2u+4 can be solved if we multiply 
both sides by the summation factor (n + 1 )-; or we can guess the answer and 
prove it by induction: EX n = (2u + 4)/7 for all u > 4. (Incidentally, there 
are always two diphages and one triphage after five steps, regardless of the 
configuration after four.) 

8.48 (a) The distance between frisbees (measured so as to make it an even 
number) is either 0, 2, or 4 units, initially 4. The corresponding generating 
functions A, B, C (where, say, [z n ] C is the probability of distance 4 after n 
throws) satisfy 

A = \zB, B = ^zB + lzC, C = l + lzB + |zC. 

It follows that A = z 2 /(16 — 20z + 5z 2 ) = z 2 /F(z), and we have Mean(A) = 
2 — Mean(E) = 12, Var(A) = — Var(F) = 100. (A more difficult but more 
amusing solution factors A as follows: 

A - Pi z . P2 Z _ P2 Piz Pi P2 Z 

1 — <4 1 z 1 - q 2 z p 2 -Pi1-qiz Pi-P 2 l-q 2 z ’ 

where pi = 4> 2 /4 = (3 + V5 )/8, p 2 = $ 2 /4 = (3 — V5)/8, and p i + q i = 
P 2 + q 2 = 1. Thus, the game is equivalent to having two biased coins whose 
heads probabilities are pi and P 2 ; flip the coins one at a time until they 
have both come up heads, and the total number of flips will have the same 
distribution as the number of frisbee throws. The mean and variance of the 
waiting times for these two coins are respectively 6 =F 2\/5 and 50 =F 22\/5, 
hence the total mean and variance are 12 and 100 as before.) 

(b) Expanding the generating function in partial fractions makes it 
possible to sum the probabilities. (Note that v5/(4<J>) + 4> 2 /4 = 1, so the 
answer can be stated in terms of powers of 4>.) The game will last more than 
n steps with probability 5 ,n_1 ,/2 4~ n (<|) n+2 — 4> n 2 ) ; when n is even this is 
5 n/2 4 -n Fn+2 _ So the answer is 5 50 4 -l 00 Fl02 _ _ 00 ()06. 

8.49 (a) If n > 0, P N (0,u) = 1 [1M = 0] + lp N _ 1 (0,n) + 4 P N -i (1 ,n-1); 
Pisi(m, 0) is similar; Pn( 0,0) = [N=0]. Hence 


9m, n — 4 ^ 9 m— l,n +1 “1“ 2 ^9m,n H" 4 ^9m+ 1 ,n.— 1 ] 
90,n = 2 + l Z 90,n + 49l,n-l i etc. 


0 3 ) 9m, n ^ 4" 4 9m— 1 ,n+l 2 9m, n^ - 4 9m+1 
etc. By induction on m, we have n = 


,n-i; 90, n = i4?9o,n + ?9l,n-i; 
: (2m + 1)go, m+n - 2m2 for a11 
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m, n ^ 0. And since 0 = g' 0 , we must have g^ n = m + n + 2mn. 

(c) The recurrence is satisfied when mn > 0, because 


sin(2m + 1 )0 


1 / sin(2m — 1 )0 

cos 2 0 \ 4 


+ 


sin(2m+1)0 sin(2m + 3)0 


this is a consequence of the identity sin ( x — y ) + sin ( x + y ) = 2sinxcosy. So 
all that remains is to check the boundary conditions. 

8.50 (a) Using the hint, we get 



now look at the coefficient of z 3+l . (b) H(z) = | + ^z + j X!i>o c 3+i z2+l - 

(c) Let r = y^l — z)(9 — z). One can show that (z— 3 + r)(z— 3 — r) = 4z, and 
hence that (r/(l — z) + 2) 2 = (13 — 5z + 4r)/(l — z) = (9 — H(z))/(l — H(z)). 

(d) Evaluating the first derivative at z = 1 shows that Mean(H) = 1. The 
second derivative diverges at z = 1 , so the variance is infinite. 

8.51 (a) Let H n (z) be the pgf for your holdings after n rounds of play, with 

Hq(z) = z. The distribution for n rounds is 


H n+ l(z) = H n (H(z)), 

so the result is true by induction (using the amazing identity of the preceding 
problem), (b) g n = H n (0) — H n _i (0) = 4/n(n + 1 )(n + 2) = 4(n— 1 )— . The 
mean is 2, and the variance is infinite, (c) The expected number of tickets you 
buy on the nth round is Mean(H n ) = 1 , by exercise 15. So the total expected 
number of tickets is infinite. (Thus, you almost surely lose eventually, and you 
expect to lose after the second game, yet you also expect to buy an infinite 
number of tickets.) (d) Now the pgf after n games is H n (z) 2 , and the method 
of part (b) yields a mean of 16 — |7t 2 ss 2.8. (The sum X!k>i V^ 2 — tt 2 /6 
shows up here.) 

8.52 If cu and cu' are events with Pr(cu) > Pr(cu'), then a sequence of 
n independent experiments will encounter cu more often than cu', with high 
probability, because cu will occur very nearly nPr(cu) times. Consequently, 
as n -4 oo, the probability approaches 1 that the median or mode of the 
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values of X in a sequence of independent trials will be a median or mode of 
the random variable X. 

8.53 We can disprove the statement, even in the special case that each 

variable is 0 or 1 . Let po = Pr(X = Y = Z = 0), pi = Pr(X = Y = Z = 0), 

P 7 = Pr(X = Y = Z = 0), where X = 1 — X. Then Po + Pi + • ■ ■ + P 7 = 1 , and 

the variables are independent in pairs if and only if we have 

(P4 +P5 +P6 +P7)(P2 + P3 +P6 + P7) = P6 + P7 , 

(P4 +P5 +P6 +P7)(Pl +P3+P5 + P7) = P5 + P7 , 

(P2 +P3 +P6 +P7)(Pl +P3 +P5 + P7) = P3+P7- 

But Pr(X + Y = Z = 0) ^ Pr(X + Y = 0)Pr(Z = 0) <=} Po ^ (Po + Pi )(po + 
P 2 + P 4 + Pe). One solution is 

Po = P3 = P5 = P6 = 1/4; Pi = P 2 = P4 = P7 = o. 

This is equivalent to flipping two fair coins and letting X = (the first coin 
is heads), Y = (the second coin is heads), Z = (the coins differ). Another 
example, with all probabilities nonzero, is 

Po = 4/64, pi = P 2 = P4 = 5/64, 

P3 = Ps = Pe = 10/64, p 7 = 15/64. 

For this reason we say that n variables Xi , . . . , X n are independent if 

Pr(X) =xi and ••• and X n =x n ) = Pr(Xi = Xi ) . . . Pr(X n =x n ) ; 

pairwise independence isn’t enough to guarantee this. 

8.54 (See exercise 27 for notation.) We have 

E(I^) = np 4 +n(n- 1 )p 2 i 

E(I 2 lf) = uy 4 +2n(n— 1)p 3 p! +n(n— 1)p| + n(n-1 )(n-2)p 2 pf ; 
E(I?) = np 4 +4n(n-1)p 3 m +3n(n— 1)p 2 

+ 6n(n— 1 ) (n— 2) p 2 pf + n(n— 1 ) (n— 2) (n— 3) p? ; 
it follows that V(VX) = K 4 /n + 2k 2 /(ti — 1 ). 

8.55 There are A = jj • 52! permutations with X = Y, and B = || • 52! 
permutations with X / Y. After the stated procedure, each permutation 
with X = Y occurs with probability jj /{{ 1 — ylp)A), because we return 
to step SI with probability ||p. Similarly, each permutation with X / Y 
occurs with probability ||( 1 — p)/((1 — ||p]B). Choosing p = ^ makes 
Pr(X=xandY = y) = for all x and y. (We could therefore make two flips 
of a fair coin and go back to SI if both come up heads.) 
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8.56 If m is even, the frisbees always stay an odd distance apart and the 
game lasts forever. If m = 21 + 1 , the relevant generating functions are 

Gm 4 Z/ ^1 i 

Ai = jzAi + ±zA 2 , 

A k = ^zA k _i + jzA k + ^zA k+ i , for 1 < k < l, 

Ai = ^zAi_i + |zAi + 1 . 

(The coefficient [z n ] A k is the probability that the distance between frisbees 
is 2k after n throws.) Taking a clue from the similar equations in exercise 49, 
we set z = 1 /cos 2 0 and Ai = X sin 20, where X is to be determined. It follows 
by induction (not using the equation for A;) that A k = Xsin2k0. Therefore 
we want to choose X such that 

(l- . 3 ? a ) X sin 218 = 1 + ^ 1 X sin(2l — 2)0 . 

V 4 cos 2 0 / 4 cos 2 0 

It turns out that X = 2 cos 2 0/ sin 0 cos (21 +1)0, hence 

cos 0 

G m — — . 

cos m 0 

The denominator vanishes when 0 is an odd multiple of 7 t/( 2 m); thus 1 — q k z is 
a root of the denominator for 1 k ^ l, and the stated product representation 
must hold. To find the mean and variance we can write 

G m = (1 - 10 2 + 2?0 4 - • • • )/(l - jvn 2 d 2 + ^m 4 0 4 - • • • ) 

= 1 + \(m. 2 — 1 )0 2 + J 4 (5m 4 — 6 m 2 + 1 )0 4 4 

= 1 + \[m. 2 — 1 )(tan0 ) 2 + 2 ^( 5 m 4 — 14m 2 + 9)(tan0 ) 4 + ■ ■ ■ 

= 1 + G^OKtanO ) 2 + lG^(l)(tan0 ) 4 + • •• , 

because tan 2 0 = z — 1 and tan 0 = 0 + j0 3 + • • • . So we have Mean(G m ) = 
j ( m 2 — 1 ) and Var ( G m ) = A m 2 ( m 2 — 1 ) . (Note that this implies the identities 


Trigonometry wins 
again. Is there a 
connection with 
pitching pennies 
along the angles of 
the m-gon? 


m 2 - 1 

2 

m 2 ( m 2 — 1 ) 
6 


( m — 1)/2 


( m — 1 )/2 


L K' 


' . (2k — 1 )7t\ 2 

“ n — 2 ^“ ) : 


( V )/2 ( + (2k- l)7r / . (2k -1)71^2 

2_ ^ 7m / Sm 


The third cumulant of this distribution is jjjm 2 (m 2 — l)(4m 2 — 1); but the 
pattern of nice cumulant factorizations stops there. There’s a much simpler 
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way to derive the mean: We have G m + Ai + • • • + = z(Ai + • • • + AJ + 1 , 

hence when z = 1 we have = Ai + • • • + A^. Since G m = 1 when z = 1 , 
an easy induction shows that A^ = 4k.) 

8.57 We have A:A ^ 2^ and B:B < 2 l ~ ] + 2 1 - 3 and B:A 5> 2 l ~ 2 , hence 
B:B — B:A A:A — A:B is possible only if A:B > 2 l ~ 3 . This means that 

T2 = t 3 , ti = x 4 , r 2 =r 5 , x 3 _ 3 = w But then A:A « 2 l_1 +2 l ~ 4 H , 

A:B « 2 1 - 3 +2 1 - 6 + • • • , B:A « 2 l ~ 2 + 2 l ~ 5 + • • • , and B:B « 2 1 - 1 +2 1 - 4 + ■ • • ; 
hence B:B — B:A is less than A:A — A:B after all. (Sharper results have been 
obtained by Guibas and Odlyzko [168], who show that Bill’s chances are 
always maximized with one of the two patterns Hxi . . . Xi_i or Txi . . . Xt_i . 
Bill’s winning strategy is, in fact, unique; see the following exercise.) 

8.58 (Solution by J. Csirik.) If A is H l or T l , one of the two sequences 

matches A and cannot be used. Otherwise let A = Xi ... Xt_i , H = HA, and 
T = tA. It is not difficult to verify that H:A = T:A = A:A, H:H + T:T = 

2 l_1 + 2( A: A) + 1, and A:H + A:T = 1 +2(A:A) — 2}. Therefore the equation 

H:H — H:A _ T:T — T:A 
A:A — A:H “ A:A — A:T 

implies that both fractions equal 

H:H — H:A + T:T — T:A _ 2 l ~ ] + 1 
A:A — A:H + A:A — A:T “ 2 l - 1 ' 

Then we can rearrange the original fractions to show that 

H:H — H:A _ A:A — A:H _ p 

T:T — T:A “ A:A — A:T “ q ’ 

where p _L q. And (p + 1 )\ gcd(2 l_1 + 1 , 2 l — 1 ) = gcd(3, 2 l — 1 ); so we may 
assume that l is even and that p = 1, q = 2. It follows that A:A — A:H = 
(2 l - 1 )/3 and A:A— A:T = (2 l+1 -2)/3, hence A:H— A:T = (2 l - 1 )/3 ^ 2 1 - 2 . 
We have A:H 2 l ~ 2 if and only if A = (TH) 1 / 2 . But then H:H — H:A = 
A:A - A:H, so 2 l ~ ] + I = 2 l - 1 and l = 2. 

(Csirik [69] goes on to show that, when l 4, Alice can do no better 
than to play HT l ~ 3 H 2 . But even with this strategy, Bill wins with probability 
nearly §.) 

8.59 According to (8.82), we want B:B — B:A > A:A — A:B. One solution is 
A = TTHH, B = HHH. 

8.60 (a) Two cases arise depending on whether h^ 7^ or h^ = h^: 


G(w, z) = 


m — 1 /m — 2 ■ 


m. 


m 


t'HNr 1 ) 


v n— k— 1 
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+ 


1 /m— 1 + wz\ 


k— 1 


WZ 


m 


m 


/m— 1 + zv 
V m / 


n-k-1 


z . 


(b) We can either argue algebraically, taking partial derivatives of G(w,z) 
with respect to w and z and setting w = z = 1; or we can argue com- 
binatorially: Whatever the values of hi , h n i , the expected value of 
P(hi , . . . , h. n _i , h n ; n) is the same (averaged over h n ), because the hash se- 
quence (h-i , . . . , h n _i ) determines a sequence of list sizes (ni , n. 2 , . . . , n m ) 
such that the stated expected value is ((ni +1 ) + (n 2+1 ) + ■ • ■ + (n m +l ))/m = 
(n — 1 + m)/m. Therefore the random variable EP(h.i , . . . , h n ; n) is indepen- 
dent of (hi , . . . , h n _i ), hence independent of P(hi , . . . , H n ; lc) . 

8.61 If 1 sj k < l ^ n, the previous exercise shows that the coefficient of 
SicSi in the variance of the average is zero. Therefore we need only consider 
the coefficient of s£, which is 

y P(h.! , . . . , h n ; k) 2 _ / y P^ , . , . , h n ; k) \ 2 
X- m n \ m n / ’ 

the variance of ((m— 1 + z) /m) k ~ 1 z; and this is (k — 1)(m— l)/m 2 as in 
exercise 30. 


8.62 The pgf D n (z) satisfies the recurrence 


D 0 (z) = z; 

D n (z) = z 2 D n _!(z) + 2(1 -z 3 )D t ' 1 _ 1 (z)/(n+ 1) , for n > 0. 

We can now derive the recurrence 


D"(1) = (n-inD^OJAn-fD + fSn-^/Z, 

which has the solution ^j(n + 2)[26n+15) for all n ^ 11 (regardless of initial 
conditions). Hence the variance comes to ||| (n + 2) for n (> 11. 

8.63 (Another question asks if a given sequence of purported cumulants 
comes from any distribution whatever; for example, K2 must be nonnegative, 
and K4 + 3 k 2 = E((X — p) 4 ) must be at least (E((X — h( 2 )) 2 = k|, etc. 
A necessary and sufficient condition for this other problem was found by 
Hamburger [6], [175].) 

9.1 True if the functions are all positive. But otherwise we might have, 
say, fi (n) = n 3 + n 2 , f 2 (n) = -n 3 , g! (n) = n 4 + n, g 2 (n) = -n 4 . 

9.2 (a) We have n lnn A c n A (lnn) n , since (Inn) 2 -< nine -< nlnlnn. 

(b) n lnlnlnn -<; (Inn)! A n lnlnn . (c) Take logarithms to show that (n!)! wins, 
(d) E 2 h ^ x 4> 21nn = n 21n< t; ~ nlncj) wins because c|) 2 = c[) + 1 < e. 
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9.3 Replacing kn by O(n) requires a different C for each k; but each 0 
stands for a single C. In fact, the context of this O requires it to stand for 
a set of functions of two variables k and n. It would be correct to write 

LLi kn = ILi °( n2 ) = °( n3 )- 

9.4 For example, limn^oc 0(1 /n) = 0. On the left, 0(1 /n) is the set of all 
functions f(n) such that there are constants C and no with I f (rt) I ;/ C/n for 
all n ^ no- The limit of all functions in that set is 0, so the left-hand side is 
the singleton set {0}. On the right, there are no variables; 0 represents {0}, the 
(singleton) set of all “functions of no variables, whose value is zero.” (Can you 
see the inherent logic here? If not, come back to it next year; you probably 
can still manipulate O-notation even if you can’t shape your intuitions into 
rigorous formalisms.) 

9.5 Let f(n) = rt 2 and g(n) = 1; then n is in the left set but not in the 
right, so the statement is false. 


9.6 nlnn + yn + 0(v / rxlnn). 

9.7 (1 — e ~ 1/n ) _1 =nB 0 -B 1 t-B 2 n '/2! ! ••• = n 


l + Ofn - 1 


9.8 For example, let f(n) = [rL/2J ! 2 + n, g(n) = (pn/2] — l)l |"n/2]! + n. 
These functions, incidentally, satisfy f(n) = 0(ng(n)) and g(n) = 0(nf(n)); 
more extreme examples are clearly possible. 


9.9 (For completeness, we assume that there is a side condition n — > oo, 
so that two constants are implied by each O.) Every function on the left has 
the form a(n) + b(n), where there exist constants mo, B, no, C such that 
|a(n)| ^ B | f (n) | for n (> mo and |b(n)| ^ C|g(n)| for n ^ no. Therefore the 
left-hand function is at most max(B, C) (|f(n) | + | g(n) |), for n )> max(mo, no), 
so it is a member of the right side. 


9.10 If g(x) belongs to the left, so that g(x) = cosy for some y, where 
|y| ^ C|x| for some C, then 0^1— g(x) = 2sin 2 (y/2) :/ iy 2 ;/ lC 2 x 2 ; hence 
the set on the left is contained in the set on the right, and the formula is true. 

9.11 The proposition is true. For if, say, |x| |y|, we have (x + y ) 2 4y 2 . 

Thus (x + y ) 2 = 0(x 2 ) + 0(y 2 ). Thus 0(x + y ) 2 = 0((x + y) 2 ) = 0(0(x 2 ) + 
0 (y 2 )) = 0 ( 0 (x 2 )) + 0 ( 0 (y 2 )) = 0 (x 2 ) + 0 (y 2 ). 

9.12 1 + 2/n + 0(n~ 2 ) = (1 + 2/n) (1 + 0(n~ 2 )/(1 + 2/n)) by ( 9 . 26 ), and 
1/(1 + 2 /n) = 0 ( 1 ); now use ( 9 . 26 ). 

9.13 n n (l + 2n~ ] + 0(n~ 2 )) n = u n exp(n(2n _1 + 0(n~ 2 ))) = e 2 n n + 
0 (n n_1 ). 

9.14 It is n n+ P exp((n + |3) (a/n — ^or/n 2 + 0(n -3 ))). 
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(It’s interesting 
to compare this 
formula with the 
corresponding re- 
sult for the middle 
binomial coefficient, 
exercise 9.60.) 


9.15 In ( n 3 T J a n ) = 3nln3 — lnn+ j ln3 — ln27t+ ( jg — l)n 1 + 0(n 3 ), so 
the answer is 

?3n+1 /2 

3 l7n ^(l-fn^ + ^n- 2 + 0(n- 3 )). 

9.16 If l is any integer in the range a ^ l < t we have 


f 1 

o 


B(x)f(l + x) dx 


1/2 

1 

1/2 


B(x)f(l + x) dx — 


1/2 


B(1 — x)f(l + x) dx 


B(x)(f(l + x) — f(l + 1 — x)) dx . 


Since l + x )> l + 1 — x when x tj. I, this integral is positive when f(x) is 
nondecreasing. 

9-17 L m /n> B m(i)z m /m! =ze z/2 /(e z -1)=z/(e z / 2 -1)-z/(e*-l). 

9.18 The text’s derivation for the case a = 1 generalizes to give 

2(2n+1/2)a 

bk(n) = ( 27CTl )«/ 2 e ~ k ~ a/n ' C ^ n ) = 2 2 n “ n _(1+ lx ) / 2+3 e e _k " lX//n ; 


the answer is 2 2n “(7m) (1 - a >/ 2 ar 1/2 (l + 0(n.- 1/2+3e )). 

9.19 H 10 = 2.928968254 « 2.928968256; 10! = 3628800 « 3628712.4; B 10 = 
0.075757576 « 0.075757494; 7t(10) = 4 « 10.0017845; e 01 = 1.10517092 « 
1.10517083; In 1.1 = 0.0953102 « 0.0953083; 1.1111111 « 1.1111000; l.l 0 - 1 = 
1 .00957658 « 1 .00957643. (The approximation to 7t(n) gives more significant 
figures when u is larger; for example, 7t(10 9 ) = 50847534 ss 50840742.) 

9.20 (a) Yes; the left side is o(n) while the right side is equivalent to O(n). 

(b) Yes; the left side is e • . (c) No; the left side is about ^/r^ times the 

bound on the right. 

9.21 We have P n = m = n(lnm — 1 — 1 /In m + 0(1 /log n) 2 ), where 


lnm = Inn + lnlnm — 1/lnu + lnln n/(lnn) 2 + 0(l/logn) 


In In m = In In n - 


In Inn (In Inn) 2 In Inn 


Inn 


2(lnn) 


+ 


(Inn) 


+ 0(1 /log n) 


It follows that 


P n = n In n + In In n — 1 


+ 


In Inn — 2 i (In In n) 2 — 3 In Inn 


Inn 


(Inn) 


+ 0(1 /log n) 
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(A slightly better approximation replaces this 0(1 /log n) 2 by the quantity 
— 5/(lnn) 2 + 0(loglogn/logn) 3 ; then we estimate Pioooooo ~ 15483612.4.) 

9.22 Replace 0(n~ 2k ) by + 0(n~ 4k ) in the expansion of H n kj 

this replaces 0(L3(n 2 )) by — y^I^fn 2 ) + 0(l3(n 4 )) in (9.53). We have 

I 3 (n) = fn- 1 +^n- 2 + 0(n- 3 ) ( 

hence the term 0(n~ 2 ) in (9.54) can be replaced by — -^rW 2 + 0(n~ 3 ). 

9.23 Tih n = ^ 0 <ck<nW(^-k)+2 c H n /(n+l)(n+2). Choose c = e n2/6 = 
£ k>0 g k so that X!k>ot L k = 0 and h n = 0(logn)/n 3 . The expansion of 
Ho$k<n ttk/(n. — k) as in (9.60) now yields nh n = 2cH ri /(n + 1 )(n + 2) + 
0(n+ 2 ), hence 


= e 


2 / 6 /ri + 21nn+0(l)\ 


n J 


J ' 


9.24 (a) If )r k >o| f (k)| < 00 and if f(n — k) = 0(f(n)) when 0 +( k +( n/2, 
we have 


n/2 


^a k b n _ k = ^0(f(k))0(f(n))+ 0(f(n))0(f(n- k)) , 

k— n/2 


k=0 


k=0 


which is 20(f(n) ]T k>0 |f(k)|), so this case is proved, (b) But in this case if 
a n = b n = a~ n , the convolution (n+ 1)a~ n is not 0(a~ n ). 

9.25 S n /( 3 n n ) = £ k=0 n ^/(2n. + 1 ) k . We may restrict the range of summa- 
tion to 0 s; k ^ (logn) 2 , say. In this range n— = n k (l — ( k )/n + 0(k 4 /n 2 )) 
and (2n + 1 ) k = (2n) k (l + ( k ^ 1 )/2n + 0(k 4 /n 2 )), so the summand is 

3k 2 -k /k 4 \\ 

-^ + °b) ’ 


2k ' 1 - 


Hence the sum over k is 2 — 4/n + 0(1/n 2 ). Stirling’s approximation can now 
be applied to ( 3 r ) 1 ) = (3n)!/(2n)!n!, proving (9.2). 

9.26 The minimum occurs at a term B2 m /(2m)(2m— 1 )n 2m_1 where 2m ss 
27m+ and this term is approximately equal to 1 / ( 7ic 2nn ^/tl ) . The absolute 
error in In n! is therefore too large to determine n! exactly by rounding to an 
integer, when n is greater than about e 2n+1 . 


9.27 We may assume that a/-l. Let f(x) = x a ; the answer is 


k“ = C« + 


n 


a+1 


k= 1 


a+l 


r 

T 


+L 

k=1 


B 


2k 


OC 


2k \2k— 1 


Tl 


oc— 2k+1 


-0(n 


a— 2m— 1 


What does a drown- 
ing analytic number 
theorist say? 

log log log log .. . 
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In particular, 

m = -1/2, 

and £(— n) = 

— B n +i/(rv+1 ) 
for integer n > 0 . 


(The constant C a turns out to be £(— a), which is in fact defined by this 
formula when oc > — 1.) 

9.28 In general, suppose f(x) = x“ lnx in Euler’s summation formula, when 
a^-1. Proceeding as in the previous exercise, we find 


^k“lnk = C; + 


n a+1 Inn n 1 


CX+1 


k=l 


oc + 1 (a + 1 ) 2 




B2k / 


(X 


k=l 


2k \2k — 1 


n 


a— 2k+1 


n a Inn 


(Inn + H a H oc _2k+i ) 


+ 0(n 


a— 2m— 1 


logn) ; 


the constant C' a can be shown [74, §3.7] to be — C'(-a). (The logn factor 
in the O term can be removed when a is a positive integer ^ 2m; in that 
case we also replace the kth term of the right sum by f> 2 kCd (2k — 2 — a)! x 
(— 1 ) a n a “ 2k+1 /(2k)! when a < 2k— 1.) To solve the stated problem, we let 
a = 1 and m = 1 , taking the exponential of both sides to get 

Q n = A-n n2/2+n/2+1/12 e- n2/4 (l +0(n- 2 )) , 

where A = eV^-C't-i) ~ 1.2824271291 is “Glaisher’s constant.” 

9.29 Let f(x) = x _1 lnx. A slight modification of the calculation in the 
previous exercise gives 


TL 


L 

k— 1 


Ink 

~k~ 


(Inn) 2 Inn 


L D 2k 

~2k 


B2k^-2k 


k=l 


(Inn 


H 2 k-i) + 0(n 2m 1 logn) , 


where yi « — 0.07281584548367672486 is a “Stieltjes constant” (see the an- 
swer to 9.57). Taking exponentials gives 



9.30 Let g(x) = x l e and f(x) = g (x/y/n). Then n l / 2 ]T k>0 k l e k2/n 
is 


f(x)dx-X^f( k - 1 >(0)-(-l) 


k=1 


Bm(M) 


m! 


f (m) (x) dx 


= n 


1/2 


g(x) dx - X ^n^'V^O) + 0(n-/ 2 ) . 


k=1 
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Since g(x) = x l — x 2+l /1 ! + x 4+l /2! — x 6+l /3! + • • • , the derivatives g ,m * (x) 
obey a simple pattern, and the answer is 


1 

2 


n d+i)/2 



Bi+i B l+3 n 1 

(l + 1 )! 0! (1 + 3)! 1! 


Bl+5 n 

(1 + 5)12! 


0(n- 3 ). 


9.31 The somewhat surprising identity 1 /(c m ~ k + c m ) + 1 /(c m+k + c m ) = 
1 /c m makes the terms for 0 ^ k ^ 2m sum to (m + The remaining 

terms are 


L 


i 

£2m+k _|_ qTr 



l 

02m+k 


i 

02m+ 1 02m 


i ( i 

(.3m+2k (.4m+3k 

1 

^3m+2 c3m 1 



and this series can be truncated at any desired point, with an error not ex- 
ceeding the first omitted term. 

9.32 Hi, 21 = 7t 2 /6 — 1/n + 0(ri+ 2 ) by Euler’s summation formula, since we 
know the constant; and H n is given by (9.89). So the answer is 

ne y+ " 2/6 ( 1 - + 0(n- 2 )) . 

9.33 We have n-/n k = 1 — k(k — 1)n _1 + d k 2 (k — 1) 2 n~ 2 + 0(k 6 n~ 3 ); 
dividing by k! and summing over k ^ 0 yields e — en -1 + |en~ 2 + 0(n~ 3 ). 

9.34 A = e Y ; B = 0; C = e D = le^(l -y); E = 3e x ; F = ^e^y+l ). 

9.35 Since l/k(lnk + 0(1)) = 1/klnk + 0(l/k(logk) 2 ), the given sum 
is X)k=2 1 /Fc In lc + 0(1). The remaining sum is In In n + 0(1) by Euler’s 
summation formula. 


9.36 This works out beautifully with Euler’s summation formula: 


S 


n 


L 

0$k<n 


1 

n 2 +k 2 


+ 


1 

n 2 + x 2 


n 

0 


' n dx 1 1 

0 n 2 + x 2 2 n 2 + x 2 


n 

+ 

0 


B 2 — 2x 

IT (n 2 + x 2 ) 2 


TL 

+ 0(n- 5 ). 

0 


Hence S n = d 7rn 1 — In 2 — j^n 3 + 0(n 5 ). 

9.37 This is 


Y_ (n — qk) [n/(q + 1 ) < k^n/q] 

k,q^1 


The world’s top 
three constants, 

(e, 7t, y) , all appear 
in this answer. 
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The remaining sum is like (9.55) but without the factor p(q). The same 
method works here as it did there, but we get £(2) in place of 1 /£(2), so the 
answer comes to (l - yy)n 2 + O(nlogn). 

9.38 Replace k by n — k and let a k (n) = (n — k) n ~ k (£). Then lna k (n) = 
nlnn — Ink! — k + 0(kn _1 ), and we can use tail-exchange with b k (n) = 
n n e~ k /k!, c k (n) = kb k (n)/n, D n = {k | k ^ Inn}, to get H k=0 Qk(n) = 
n T1 e 1 / e (l + 0(n _1 )). 

9.39 Tail-exchange with b k (n) = (Inn — k/n — |k 2 /n 2 )(lnn) k /k!, c k (n) = 
n~ 3 (lnn) k+3 /k!, D n = {k | 0 ^ k ^ lOlnn}. When k « lOlnn we have 
k! x \/k (10/e) k (lnn) k , so the kth term is O(n- 101n(10/e) logn). The answer 
is nlnn — Inn — l(lnn)(1 + lnn)/n + 0(n~ 2 (logn) 3 ). 

9.40 Combining terms two by two, we find that HJ}. — (H2 k — 517 ) m = 

plus terms whose sum over all k }> 1 is 0(1). Suppose n is even. 
Euler’s summation formula implies that 


n / 2 H m -1 


L 

k=1 


2k 


(ln2e Y k) m_1 +0(1 /k) 


n/2 

L 

k=l 

(In e Y n) m 
m 


k 

■on: 


o(i : 


hence the sum is jHJ} 1 + 0(1 ). In general the answer is l (—1 ) n H™ + 0(1 ). 


9.41 Let a = <f>/cf> = — 4> 2 . We have 


n n 

^lnF k = ^(ln4) k -ln\/5 + ln(l-a k )) 

k=1 k=l 

= U(n 2 +1) ln(>-^ln5+^ln(1-oc k )-^ ln(l-a k ). 

k^l k>n 

The latter sum is )T k>n 0( ak ) = 0(a n ). Hence the answer is 


c[3 Tx(Tv+1)/2 5 -n/2 c + 0 ( c j ) n(n-3)/2 5 -n/2j , 

where C = ( 1 — a) ( 1 — a 2 ) ( 1 — a 3 ) . . . « 1 .2267 42. 
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9.42 The hint follows since r ^ — 0CTl , . < Let 

\k— I// \k/ n— k+1 ^ n— cm+1 1 — oc 

m = L<xtlJ = an — e. Then 



So Xlk< an (k) = (^)O(l), and it remains to estimate (™). By Stirling’s ap- 
proximation we have In ( n ) = — / Inn— (an— e) ln(a— e/n) — ((1 — a)n+e) x 
Inf 1 — a + e/n) + 0(1) = — y Inn — an In a — (1 — a)nln(1 — a) + 0(1 ). 

9.43 The denominator has factors of the form z — cu, where eu is a complex 
root of unity. Only the factor z — 1 occurs with multiplicity 5. Therefore by 
(7.31), only one of the roots has a coefficient D(n 4 ), and the coefficient is 

c = 5/(5!- 1 -5- 10-25-50) = 1/1500000. 

9.44 Stirling’s approximation says that ln(x~“x!/(x — a)!) has an asymp- 
totic series 


— a — (x + j — a) ln(l 


a/x) 





in which each coefficient of x~ k is a polynomial in oc. Hence x~“x!/(x— a)! = 
Co (a) + Ci (a)x _1 + ■ ■ ■ + Cn.(a)x _Tl + 0 (x _n_1 ) as x — > 00, where c n (a) is a 
polynomial in oc. We know that c n (a) = [ “ n ] (—1 ) n whenever a is an integer, 
and [ a a n ] is a polynomial in a of degree 2n; hence c n (a) = [ a a n ] (—1 ) n for 
all real a. In other words, the asymptotic formulas 


= y_ 

k=0 
n 

k=0 

generalize equations (6.13) and (6.11), which hold in the all-integer case. 

9.45 Let the partial quotients of a be (cm , cm, . . . ), and let a m be the con- 
tinued fraction l/(a m + a m +i ) for m / 1. Then D(a, n) = D(ai,n) < 
D(a 2 , |amj) + cm + 3 < D(a3, [azLa-inJJ) + ai + cm + 6 < • • • < D(a m+ i , 
La m [- . • [oci TtJ . . .JJ) -fai -I ba m + 3m < ai . . . a m n+<M H ha m + 3m, 


a 

a — k 
a 

a— k 


(_l ) k x «-k + 0 ( x «- T 1 -i) > 

x «-k _i_ 0(x a_n_1 ) 


(See [220] for fur- 
ther discussion.) 
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A truly Bell-shaped 
summand. 


for all m. Divide by n and let n — > oo; the limit is less than oci . . . a m for 
all m. Finally we have 

1 1 

OCT ...a m = 777 ; 7 < F • 

K(di , . . . , CLrrL — 1 > Qm + 0C m ) tm +1 

9.46 For convenience we write just m instead of m(n). By Stirling’s ap- 
proximation, the maximum value of k n /k! occurs when k « m ss n/lnn, so 
we replace k by m + k and find that 


In 


(m + k) n 
(m + k)! 


= nlnm — mlnm + m — 


ln27tm 


(m + n)k 2 
2 m 2 


+ 0 (k 3 m 2 logn 


Actually we want to replace k by [mj +k; this adds a further 0(km + 1 logn). 
The tail-exchange method with |k| +( m^ 2+e now allows us to sum on k, 
giving a fairly sharp asymptotic estimate in terms of the quantity 0 in ( 9 . 93 ): 


CDn 


girl— 1 m n-m 

— — (®2m 2 /(m+n) + 0(1)) 

v27rrn 


e m-n-1/2 m n 


m 


m + n 


1+0 


logn\\ 
n'/ 2 )) ' 


The requested formula follows, with relative error O (log log n/logn). 

9.47 Let log m n = l + 0, where 0 ^ 0 < 1 . The floor sum is l(n + 1 ) + 1 — 
(m l+1 — 1 )/(m — 1 ); the ceiling sum is (l + 1 )n — (m l+1 — 1 )/(m — 1 ); the 
exact sum is (1 + 0)n — n/ln m + O(logn). Ignoring terms that are o(n), the 
difference between ceiling and exact is (l — f( 0 ))n, and the difference between 
exact and floor is f( 0 )n, where 


f( 0 ) 


m 


-0 


m — 1 


+ 0 - 


1 

In m 


This function has maximum value f(0) = f(l ) = m/(m— 1 ) — 1 /lnm, and its 
minimum value is lnlnm/lnm+ 1 — (ln(m— l))/lnm. The ceiling value is 
closer when n is nearly a power of m, but the floor value is closer when 0 lies 
somewhere between 0 and 1 . 

9.48 Let dk = aic + bk, where ak counts digits to the left of the decimal 
point. Then ak = 1 + [logHkJ = loglogk+ 0(1), where ‘log’ denotes log 10 . 
To estimate bk, let us look at the number of decimal places necessary to 
distinguish p from nearby numbers y — e and y + e': Let 6 = 10~ b be the 
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length of the interval of numbers that round to y. We have |y — y| ^ ^6; 
also y — e<y — ^5 and y + e' > y + Therefore e + e' > 5. And if 
6 < min(e, e'), the rounding does distinguish y from both y — e and y + e'. 
Hence 10~ bk < 1 /(lc — 1) + 1/k and 10 1 ~ bk ^ 1 /k; we have b k = logk + 
0(1). Finally, therefore, JI^ =1 dk = ]T£ =1 (log k + log log k + 0(1)), which is 
nlogn + n log log n + 0(n) by Euler’s summation formula. 

9.49 We have H n > lnn+y+ ^nA 1 — = f ( tl) , where f(x) is increasing 

for all x > 0; hence if n ^ e‘ x ~ Y we have H n f(e“~ Y ) > a. Also H n _i < 
Inn + y — = g(n.), where g(x) is increasing for all x > 0; hence if 

n ^ e“~ Y we have H n _i g(e a ~ Y ) < a. Therefore H n _i ^ a H n implies 
that e“~ Y + 1 > n > e a+Y — 1. (Sharper results have been obtained by Boas 
and Wrench [33].) 

9.50 (a) The expected return is X!i<k<N h/tk 2 !!^') = Hn/H^ 1 , and we 
want the asymptotic value to O ( N - 1 ) : 

InN+y + OtN- 1 ) _ 61n10 6y 361n10 n 
7T 2 /6 — + 0(N~ 2 ) 7t 2 n 7t 2 7T 4 10 n 


The coefficient (61nl0)/7t 2 « 1.3998 says that we expect about 40% profit. 

(b) The probability of profit is L n <k;cN Vl^H^ 1 ) = 1 - Hn'/H^ 2 ’, 
and since H.)t ' =\ — ti _1 + j n ~ 2 + 0(n~ 3 ) this is 


n 1 — Tn 2 + 0(n 3 ) 
7t 2 /6 + 0(N-') 


-^n 1 - 2 + 0(n 3 ), 

TV- 7t z 


actually decreasing with n. (The expected value in (a) is high because it 
includes payoffs so huge that the entire world’s economy would be affected if 
they ever had to be made.) 

9.51 Strictly speaking, this is false, since the function represented by 0(x~ 2 ) 
might not be integrable. (It might be ‘ [x £ S]/x 2 ’, where S is not a measurable 
set.) But if we stipulate that f(x) is an integrable function such that f(x) = 
0(x~ 2 ) as x — » oo, then |J“ f(x) dx| ^ J^°|f(x)| dx ^ Cx~ 2 dx = Cn _1 . 

9.52 In fact, the stack of n’s can be replaced by any function f(n) that 
approaches infinity, however fast. Define the sequence (mo, mi , m 2 , . . . ) by 
setting mo = 0 and letting rrik be the least integer > rrik-i such that 



Now let A(z) = Jlk>i ( z /k) mk . This power series converges for all z, because 
the terms for k > |z| are bounded by a geometric series. Also A(n + 1 ) 
((n + 1 )/n) ^ f (n + 1 ) 2 , hence limn^oo f (n)/A(n) = 0. 


(As opposed to an 
execrable function.) 
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Sounds like a nasty 
theorem. 


9.53 By induction, the O term is (m — 1 )! 1 Jq t m 1 f |m ' (x — t) dt. Since 

f (m+i ) opposite sign to f^ m *, the absolute value of this integral is 

bounded by |f (m) (0) | J* t m - ] dt; so the error is bounded by the absolute 
value of the first discarded term. 

9.54 Let g(x) = f(x)/x“. Then g'(x) ~ — ag(x)/x as x-> oo. By the mean 
value theorem, g(x— j) — g(x + \) = — g'[y) ~ ag(y)/y for some y between 
x - 2 and x+ Now g(y) = g(x) (l + 0(1 /x)), so g(x - \) - g(x + \) ~ 
ag(x)/x = af(x)/x 1+ “. Therefore 

I^rri = o(^(g(k-l)-g(k+l))) = 0(g(n-l)). 

k^n k^n 

9.55 The estimate of (n + k + j) ln(l + k/n) + (n — k + j) ln(l — k/n) is 
extended to k 2 /n + k 4 /6n 3 + 0(n~ 3 / 2+5e ), so we apparently want to have an 
extra factor e ~ k4/6n3 in b k (n), and Ck(n) = 2 2n n~ 2+5e e~ k2/n . But it turns 
out to be better to leave b k (n) untouched and to let 

c k (n) = 2 2n n- 2+5e e- k2/n + 2 2n n- 5+5e k 4 e- k2/n , 

thereby replacing e - k4 / gn3 by l+0(k 4 /n 3 ). The sum ]T k k 4 e- k2/n is 0(n 5/2 ), 
as shown in exercise 30. 

9.56 If k ^ n 1 / 2+e we have ln(n-/'n k ) = k 2 /n + 3, k/n — gk 3 /n 2 + 
0(n~ 1+4e ) by Stirling’s approximation, hence 

n-/n k = e- k2/2n (l + k/2 n- fk 3 /(2n) 2 + 0(n~ 1+4e )) . 

Summing with the identity in exercise 30, and remembering to omit the term 
for k = 0, gives -1 + 0 2n + 0^ - §0^ + 0(n-'/ 2+4e ) = 0^2 - 1 + 

O (n-^ 2+4e ). 

9.57 Using the hint, the given sum becomes ue~ u C(1 +u/lnn) du. The 
zeta function can be defined by the series 

C(1+z) = z- 1 + ^(-ip Ym z m /m!, 

m3>0 

where Yo = Y and Tm is the Stieltjes constant [341, 201] 
lim fV (lnk)m (lnn) m+1 \ 

n— ► k m+1 / 

x k=1 ' 

Hence the given sum is 

Inn + y — 2yi (Inn) -1 + 3Y2(lnn)~ 2 . 
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9.58 Let 0 </ 0 <: 1 and f (z) = e 27tiz0 /(e 27Tiz - 1 ). We have 


f(z) 


e -27ty0 

1 + e- 2n v 


f(z) < 


e - 2n y 0 ^ i 

\e~ 2n y - 1 | ^ 1 - e- 2ne ’ 


when x mod 1 = 
when |y| ^ e. 


Therefore |f(z)| is bounded on the contour, and the integral is 0(M 1_m ). 
The residue of 27tif(z)/z m at z = k ^ 0 is e 27tlk0 /k m ; the residue at z = 0 is 
the coefficient of z _1 in 


:) 27riz0 


ym+1 


Bo + B 


27tiz 

TT 


+ ■••) - z1 ^(b o (0) + B 1 (0)— +-■) 


namely (27ri) m B m (0)/m!. Therefore the sum of residues inside the contour 
is 


(27ti) m 

m! 


Bm(0) + 2 


M 


L 


Ttim/ 2 cos ( 27 Tlc 0 - 7 rm/ 2 ) 

k m 


This equals the contour integral OfM 1 m ), so it approaches zero as M — > oo. 

9.59 If F(x) is sufficiently well behaved, we have the general identity 

^F(k + t) = ^G(27m)e 2Ttint , 

k n 


where G(y ) = J ^e 11JX F(x) dx. (This is “Poisson’s summation formula,” 
which can be found in standard texts such as Henrici [182, Theorem 10. 6e] .) 

9.60 The stated formula is equivalent to 


n 


1/2 


= rt 


1/2 


1 - 


1 , 1 , 5 21 

8n + 128n 2 + 1024n 3 ~~ 32768n 4 


■0(n“ 


by exercise 5.22. Hence the result follows from exercises 6.64 and 9.44. 

,2 k 

9.61 The idea is to make a “almost” rational. Let aic = 2 be the kth 
partial quotient of a, and let n = 2 a m+i 4mi where q m = K(ai , . . . , a m ) and 
m is even. Then 0 < {q m a} < 1 /K(ai , . . . , a m+ i ) < l/(2n), and if we take 
v = a m +i/(4n) we get a discrepancy /? ^Q m +i- If this were less than n 1_e 
we would have a^ +1 = Ofq]^ 6 ); but in fact a m +i > q(vT- 

9.62 See Canfield [48]; see also David and Barton [71, Chapter 16] for asymp- 
totics of Stirling numbers of both kinds. 
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“The paradox is now 
fully established 
that the utmost 
abstractions are the 
true weapons with 
which to control 
our thought of 
concrete fact.” 

—A.N. White- 
head [372] 


9.63 Let c = The estimate cn^ -1 +o(n‘t’~ 1 ) was proved by Pine [150]. 

Ilan Vardi observes that the sharper estimate stated can be deduced from 
the fact that the error term e(n) = f(n) — cn^ -1 satisfies the approximate 
recurrence c <|, n 2_<,, e(n) « — ]T k e(k)[1 ^kccn* -1 ]. The function 

n <l>_1 u(ln In n/lnc[>) 

Inn 

satisfies this recurrence asymptotically, if u(x + 1) = — u(x). (Vardi conjec- 
tures that 

f(n) = n^- 1 ^c + u( 1 ^)(lnn)- 1 +0((logn)- 2 )^) 

for some such function u.) Calculations for small n show that f (n) equals the 
nearest integer to cn^ _1 for 1 sj n ^ 400 except in one case: f (273) = 39 > 
c • 273 « 38.4997. But the small errors are eventually magnified, because 
of results like those in exercise 2.36. For example, e(201 636503) « 35.73; 
e(91 9986484788) « -1959.07. 

9.64 (From this identity for B 2 (x) we can easily derive the identity of exer- 
cise 58 by induction on m.) If 0 < x < 1, the integral sinN7tt dt/sin7tt 
can be expressed as a sum of N integrals that are each 0(N~ 2 ), so it is 
0(N _1 ); the constant implied by this O may depend on x. Integrating the 
identity Jln=i cos2u7tt = 93(e 27tU (e 2N7tU — 1 )/(e 2mt — 1 )) = — \ + \ sin(2N + 
1 )7Tt/sin7tt and letting N — > oo now gives 2Ln>i (sin2n7tx)/n = j — 7rx, a re- 
lation that Euler knew ([107] and [110, part 2, §92]). Integrating again yields 
the desired formula. (This solution was suggested by E. M. E. Wermuth [367]; 
Euler’s original derivation did not meet modern standards of rigor.) 

9.65 Since ao + c^nA 1 +a 2 n~ 2 -| = 1 +(n— 1 ) _1 (a 0 + ai (n— 1 ) _1 +a 2 (n— 

1 )~ 2 + • • • ), we obtain the recurrence a m+ i = £{ k ( 1 ] [ l ) a^, which matches the 
recurrence for the Bell numbers. Hence a m = ro m . 

A slightly longer but more informative proof can be based on the fact 
that l/(n- 1) ... (n-m) =I k {^}/ nk , b y (7-47)- 

9.66 The expected number of distinct elements in the sequence 1, f(l), 
f (f ( 1 )), •••) when f is a random mapping of {1,2, ...,n} into itself, is the 
function Q (n) of exercise 56, whose value is j \/2nn-\- 0(1); this might account 
somehow for the factor \/2nrv. 

9.67 It is known that lnxn ~ §n 2 In |; the constant e _7t / 6 has been verified 
empirically to eight significant digits. 

9.68 This would fail if, for example, e n ~ Y = m+ \ + e/m for some integer m 
and some 0 < e < i; but no counterexamples are known. 
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Credits for Exercises 


THE EXERCISES in this book have been drawn from many sources. The 
authors have tried to trace the origins of all the problems that have been 
published before, except in cases where the exercise is so elementary that its 
inventor would probably not think anything was being invented. 

Many of the exercises come from examinations in Stanford’s Concrete 
Mathematics classes. The teaching assistants and instructors often devised 
new problems for those exams, so it is appropriate to list their names here: 


Year 

Instructor 

Teaching Assistant (s) 

1970 

Don Knuth 

Vaughan Pratt 

1971 

Don Knuth 

Leo Guibas 

1973 

Don Knuth 

Henson Graves, Louis Jouaillec 

1974 

Don Knuth 

Scot Drysdale, Tom Porter 

1975 

Don Knuth 

Mark Brown, Luis Trabb Pardo 

1976 

Andy Yao 

Mark Brown, Lyle Ramshaw 

1977 

Andy Yao 

Yossi Shiloach 

1978 

Prances Yao 

Yossi Shiloach 

1979 

Ron Graham 

Prank Liang, Chris Tong, Mark Haiman 

1980 

Andy Yao 

Andrei Broder, Jim McGrath 

1981 

Ron Graham 

Oren Patashnik 

1982 

Ernst Mayr 

Joan Feigenbaum, Dave Helmbold 

1983 

Ernst Mayr 

Anna Karlin 

1984 

Don Knuth 

Oren Patashnik, Alex Schaffer 

1985 

Andrei Broder 

Pang Chen, Stefan Sharkansky 

1986 

Don Knuth 

Arif Merchant, Stefan Sharkansky 


The TA sessions 
were invaluable, 

I mean really great. 


Keep the same 
instructor and the 
same TAs next year. 


Class notes very 
good and useful. 


I never “got” Stir- 
ling numbers. 


In addition, David Klarner (1971), Bob Sedgewick (1974), Leo Guibas (1975), 
and Lyle Ramshaw (1979) each contributed to the class by giving six or more 
guest lectures. Detailed lecture notes taken each year by the teaching assis- 
tants and edited by the instructors have served as the basis of this book. 
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3.52 Fraenkel [128]. 

3.53 S. K. Stein.*. 
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5.81 1986 final exam; [219]. 

5.82 Hillman and Hoggatt [188]. 

5.85 Hsu [190]. 

5.86 Good [153]. 

5.88 Hermite [186]. 

5.91 Whipple [369]. 

5.92 Clausen [60], [61]. 

5.93 Gosper [154], 

5.95 Petkovsek [291, Corollary 3.1]. 

5.96 Petkovsek [291, Corollary 5.1]. 

5.98 Ira Gessel.* 

5.102 H. S. Wilf.* 

5.104 Volker Strehl.* 

5.105 Henrici [183, p. 118]. 

5.108 Apery [14], 

5.109 Gessel [146]. 

5.110 R. William Gosper, Jr.* 

5.111 [95, p. 71]. 

5.112 [95, p. 71]. 

5.113 Wilf and Zeilberger [374]. 

5.114 Strehl [344] credits A. Schmidt. 

6.6 Fibonacci [122, p. 283]. 

6.15 [209, exercise 5. 1.3-2]. 

6.21 Theisinger [350]. 

6.25 Gardner [138] credits Denys Wilquin. 

6.27 Lucas [257]. 

6.28 Lucas [259, chapter 18]. 

6.31 Lah [235]; R.W. Floyd.* 
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6.35 1977 midterm. 

6.37 Shallit [324], 

6.39 [207, exercise 1.2.7-15]. 

6.40 Klamkin [202, problem 1979/1]. 

6.41 1973 midterm. 

6.43 Brooke and Wall [41]. 

6.44 Matiiasevich [266]. 

6.46 Francesca [131]; Wallis [360, chap- 
ter 4]. 

6.47 Lucas [257]. 

6.48 [208, exercise 4.5.3-9(c)]. 

6.49 Davison [73]. 

6.50 1985 midterm; Rham [307]; Dijk- 
stra [79, pp. 230-232]. 

6.51 Waring [361]; Lagrange [233]; Wol- 
stenholme [376]. 

6.52 Eswarathasan and Levine [97]. 

6.53 Kaucky [200] treats a special case. 

6.54 Staudt [336]; Clausen [62]; Rado [300]. 

6.55 Andrews and Uchimura [13]. 

6.56 1986 midterm. 

6.57 1984 midterm, suggested by R. W. 
Floyd.* 

6.58 [207, exercise 1.2.8-30]; 1982 midterm. 

6.59 Burr [47]. 

6.61 1976 final exam. 

6.62 Borwein and Borwein [36, §3.7]. 

6.63 [207, section 1.2.10]; Stanley [335, 
proposition 1.3.12]. 

6.65 Tanny [349]. 

6.66 [209, exercise 5. 1.3-3]. 

6.67 Chung and Graham [59]. 

6.68 Logan [253]. 

6.69 [209, exercise 6.1-13]. 

6.72 Euler [110, part 2, chapter 8]. 

6.73 Euler [108, chapters 9 and 10]; 
Schroter [321]. 

6.75 Arnold [15]. 

6.76 Lengyel [248]. 

6.78 Logan [253]. 

6.79 Comic section, Boston Herald, 

August 21, 1904. 

6.80 Silverman and Dunn [329]. 


6.82 [217]. 

6.83 [156], modulo a numerical error. 

6.85 Burr [47]. 

6.86 [226. 

6.87 [208, exercises 4. 5. 3-2 and 3]. 

6.88 Adams and Davison [3]. 

6.90 Lehmer [243]. 

6.92 Part (a) is from Eswarathasan and 
Levine [97]. 

7.2 [207, exercise 1.2. 9-1]. 

7.8 Zave [380]. 

7.9 [207, exercise 1.2.7-22]. 

7.11 1971 final exam. 

7.12 [209, pp. 63-64], 

7.13 Raney [302], 

7.15 Bell [24], 

7.16 Polya [296, p. 149]; [207, exercise 
2.3.4.4-1], 

7.19 [221], 

7.20 Jungen [198, p. 299] credits A. 
Hurwitz. 

7.22 Polya [298]. 

7.23 1983 homework. 

7.24 Myers [279]; Sedlacek [323]. 

7.25 [208, Carlitz’s proof of lemma 3.3.3B] 

7.26 [207, exercise 1.2.8-12]. 

7.32 [95, pp. 25-26] credits L. Mirsky and 
M. Newman. 

7.33 1971 final exam. 

7.34 Tomas Feder.* 

7.36 1974 final exam. 

7.37 Euler [109, §50]; 1971 final exam. 

7.38 Carlitz [49], 

7.39 [207, exercise 1.2.9-18]. 

7.41 Andre [8]; [209, exercise 5.1.4-22]. 

7.42 1974 final exam. 

7.44 Gross [166]; [209, exercise 5.3. 1-3]. 

7.45 de Bruijn [75]. 

7.47 Waugh and Maxfield [363]. 

7.48 1984 final exam. 

7.49 Waterhouse [362] . 

7.50 Schroder [320]; [207, exercise 2. 3.4.4- 
31], 



636 CREDITS FOR EXERCISES 


7.51 Fisher [124]; Percus [290, pp. 89-123]; 
Stanley [334]. 

7.52 Hammersley [177]. 

7.53 Euler [114, part 2, section 2, chapter 
6, §91]. 

7.54 Moessner [274]. 

7.55 Stanley [333]. 

7.56 Euler [113]. 

7.57 [95, p. 48] credits P. Erdos and 
P. Turan. 

8.13 Thomas M. Cover.* 

8.15 [207, exercise 1.2.10-17]. 

8.17 Patil [286]. 

8.24 John Knuth (age 4) and DEK; 1975 
final. 

8.26 [207, exercise 1.3.3-18]. 

8.27 Fisher [125]. 

8.29 Guibas and Odlyzko [168]. 

8.32 1977 final exam. 

8.34 Hardy [180] has an incorrect analysis 
leading to the opposite conclusion. 

8.35 1981 final exam. 

8.36 Gardner [139] credits George Sicher- 
man. 

8.38 [208, exercise 3.3.2-10]. 

8.39 [211, exercise 4.3(a)]. 

8.41 Feller [120, exercise IX. 33]. 

8.43 [207, sections 1.2.10 and 1.3.3]. 

8.44 1984 final exam. 

8.46 Feller [120] credits Hugo Steinhaus. 

8.47 1974 final, suggested by “fringe 
analysis” of 2-3 trees. 

8.48 1979 final exam. 

8.49 Blom [32]; 1984 final exam. 

8.50 1986 final exam. 

8.51 1986 final exam. 

8.53 Feller [120] credits S. N. Bernstein. 

8.57 Lyle Ramshaw.* 

8.58 Guibas and Odlyzko [168]. 

9.1 Hardy [179, 1.3(g)]. 

9.2 Part (c) is from Garfunkel [140]. 

9.3 [207, exercise 1.2.11.1-6]. 

9.6 [207, exercise 1.2.11.1-3]. 


9.8 Hardy [179, 1.2(iv)]. 

9.9 Landau [238, vol. 1, p. 60]. 

9.14 [207, exercise 1.2.11.3-6]. 

9.16 Knopp [204, edition 2, §64C]. 

9.18 Bender [25, §3.1]. 

9.20 1971 final exam. 

9.24 [164, §4.1.6]. 

9.27 Titchmarsh [352]. 

9.28 Glaisher [149]. 

9.29 de Bruijn [74, §3.7]. 

9.32 1976 final exam. 

9.34 1973 final exam. 

9.35 1975 final exam. 

9.36 1980 class notes. 

9.37 [208, eq. 4.5.3-21], 

9.38 1977 final exam. 

9.39 1975 final exam, inspired by 
Reich [306]. 

9.40 1977 final exam. 

9.41 1980 final exam. 

9.42 1979 final exam. 

9.44 Tricomi and Erdelyi [353]. 

9.46 de Bruijn [74, §6.3]. 

9.47 1980 homework; [209, eq. 5.3.1-34]. 

9.48 1980 final exam. 

9.49 1974 final exam. 

9.50 1984 final exam. 

9.51 [164, §4.2.1], 

9.52 Poincare [294]; Borel [35, p. 27]. 

9.53 Polya and Szego [299, part 1, problem 
140]. 

9.57 Andrew M. Odlyzko.* 

9.58 Henrici [182, exercise 4.9.8]. 

9.60 [225]. 

9.62 Canfield [48]. 

9.63 Vardi [358]. 

9.65 Comtet [64, chapter 5, exercise 24]. 

9.66 M. P. Schiitzenberger.* 

9.67 Lieb [250]; Stanley [335, exercise 
4.37(c)], 

9.68 Boas and Wrench [33]. 

* Unpublished personal communication. 
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0°, 162 

sjl (« 1.41421), 100 
V3 (« 1.73205), 378 
3: imaginary part, 64 

£: logarithmico-exponential functions, 442-443 
iH: real part, 64, 212, 451 
Y (~ 0.57722), see Euler’s constant 
T, see Gamma function 
5, 47-56 

A: difference operator, 47-55, 241, 470-471 
£ p (n): largest power of p dividing u, 112-114, 
146 

£, see zeta function 
0, 219-221, 310, 347 
0: Big Theta notation, 448 
K m , see cumulants 
|T, see Mobius function 
"v, see nu function 

7t (« 3.14159), 26, 70, 146, 244, 485, 564, 596 
7t(x), see pi function 

ff: standard deviation, 388; see also Stirling’s 
constant 

cr n (x), see Stirling polynomials 
4) (« 1.61803): golden ratio, 70, 97, 299-301, 
310, 553 

cp, see phi function 
®: sum of cp, 137-139, 462-463 
O: Big Omega notation, 448 
^-notation, 22-25, 245 


^[-notation, 64, 106 
/\-notation, 65 

if and only if, 68 
=£•: implies, 71 
\: divides, 102 
A: exactly divides, 146 
_L: is relatively prime to, 115 
-<: grows slower than, 440-443 
>-: grows faster than, 440-443 
X: grows as fast as, 442-443 
is asymptotic to, 8, 428-429 
approximates, 23 
=: is congruent to, 123-126 
cardinality, 39 
!: factorial, 111-115 
j: subfactorial, 194-200 
. . : interval notation, 73-74 
. . . : ellipsis, 21, 50, 108, . . . 

Aaronson, Bette Jane, ix 
Abel, Niels Henrik, 604, 634 
Abramowitz, Milton, 42, 604 
absolute convergence, 60-62, 64 
absolute error, 452, 455 
absolute value of complex number, 64 
absorption identities, 157-158, 261 
Acton, John Emerich Edward Dalberg, Baron, 
66 

Adams, William Wells, 604, 635 
Addison- Wesley, ix 


637 
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addition formula for £ , 158-159 
analog for £ , 268 
analogs for {£} and £ , 259, 261 
dual, 530 

Aho, Alfred Vaino, 604, 633 
Ahrens, Wilhelm Ernst Martin Georg, 8, 604 
Akhiezer, Naum Il’ich, 604 
Alfred [Brousseau], Brother Ulbertus, 607, 633 
algebraic integers, 106, 147 
algorithms, analysis of, 138, 413-426 
divide and conquer, 79 
Euclid’s, 103, 123, 303-304 
Fibonacci’s, 95, 101 
Gosper’s, 224-227 

Gosper- Zeilberger, 229-241, 254-255 
greedy, 101, 295 
self-certifying, 104 
Alice, 31, 408-410, 430 
Allardice, Robert Edgar, 2, 604 
ambiguous notation, 245 
American Mathematical Society, viii 
AMS Euler, ix, 657 
analysis of algorithms, 138, 413-426 
analytic functions, 196 
ancestor, 117, 291 
Andre, Antoine Desire, 604, 635 
Andrews, George W. Eyre, 215, 330, 530, 575, 
605, 634, 635 

answers, notes on, 497, 637, viii 
anti-derivative operator, 48, 470-471 
anti-difference operator, 48, 54, 470-471 
Apery, Roger, 238, 605, 629, 634 
numbers, 238-239, 255 
approximation, see asymptotics 

of sums by integrals, 45, 276-277, 469-475 
Archibald, Raymond Clare, 608 
argument of hyper geometric, 205 
arithmetic progression, 30, 376 
floored, 89-94 
sum of, 6, 26, 30-31 
Armageddon, 85 

Armstrong, Daniel Louis (= Satchmo), 80 

Arnol’d, Vladimir Igorevich, 605, 635 

art and science, 234 

ascents, 267-268, 270 

Askey, Richard Allen, 634 

associative law, 30, 61, 64 


asymptotics, 439-496 

from convergent series, 451 
of Bernoulli numbers, 286, 452 
of binomial coefficients, 248, 251, 495, 598 
of discrepancies, 492, 495 
of factorials, 112, 452, 481-482, 491 
of harmonic numbers, 276-278, 452, 480-481, 
491 

of hashing, 426 

of nth prime, 110-111, 456-457, 490 
of Stirling numbers, 495, 602 
of sums, using Euler’s summation formula, 
469-489 

of sums, using tail-exchange, 466-469, 
486-489 

of sums of powers, 491 
of wheel winners, 76, 453-454 
table of expansions, 452 
usefulness of, 76, 439 
Atkinson, Michael David, 605, 633 
Austin, Alan Keith, 607 
automaton, 405 
automorphic numbers, 520 
average, 384 

of a reciprocal, 432 
variance, 423-425 

B n , see Bernoulli numbers 

Bachmann, Paul Gustav Heinrich, 443, 462, 605 

Bailey, Wilfrid Norman, 223, 548, 605, 634 

Ball, Walter William Rouse, 605, 633 

Banach, Stefan, 433 

Barlow, Peter, 605, 634 

Barton, David Elliott, 602, 609 

base term, 240 

baseball, 73, 148, 195, 519, 648, 653 

BASIC, 173, 446 

basic fractions, 134, 138 

basis of induction, 3, 10-11, 320-321 

Bateman, Harry, 626 

Baum, Lyman Frank, 581 

Beatty, Samuel, 605, 633 

bee trees, 291 

Beeton, Barbara Ann Neuhaus Friend Smith, 
viii 

Bell, Eric Temple, 332, 605, 635 
numbers, 373, 493, 603 
Bender, Edward Anton, 606, 636 
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Bernoulli, Jakob (= Jacobi = Jacques = James), 
283, 470, 606 

numbers, see Bernoulli numbers 
polynomials, 367-368, 470-475 
polynomials, graphs of, 473 
trials, 402; see also coins, flipping 
Bernoulli, Johann (= Jean), 622 
Bernoulli numbers, 283-290 
asymptotics of, 286, 452 
calculation of, 288, 620 
denominators of, 315 
generalized, see Stirling polynomials 
generating function for, 285, 351, 365 
relation to tangent numbers, 287 
table of, 284, 620 

Bernshtem (= Bernstein), Sergei Natanovich, 

636 

Bertrand, Joseph Louis Frangois, 145, 606, 633 
postulate, 145, 500, 550 
Bessel, Friedrich Wilhelm, functions, 206, 527 
Beyer, William Hyman, 606 
biased coin, 401 
bicycles, 260, 500 
Bieberbach, Ludwig, 617 
Bienayme, Irenee Jules, 606 
Big Ell notation, 444 
Big Oh notation, 76, 443-449 
Big Omega notation, 448 
Big Theta notation, 448 
bijection, 39 
Bill, 408-410, 430 
binary logarithm, 70 

binary notation (radix 2), 11-13, 15-16, 70, 
113-114 

binary partitions, 377 
binary search, 121, 183 
binary trees, 117 

Binet, Jacques Philippe Marie, 299, 303, 

606, 633 

binomial coefficients, 153-242 
addition formula, 158-159 
asymptotics of, 248, 251, 495, 598 
combinatorial interpretation, 153, 158, 160, 
169-170 

definition, 154, 211 
dual, 530 

generalized, 211, 318, 530 
indices of, 154 
middle, 187, 255-256, 495 


reciprocal of, 188-189, 246, 254 
top ten identities of, 174 
wraparound, 250 (exercise 75), 315 
binomial convolution, 365, 367 
binomial distribution, 401-402, 415, 428, 432 
negative, 402-403, 428 
binomial number system, 245 
binomial series, generalized, 200-204, 243, 

252, 363 

binomial theorem, 162-163 

as hyper geometric series, 206, 221 
discovered mechanically, 230-233 
for factorial powers, 245 
special cases, 163, 199 
Blom, Gunnar, 606, 636 
bloopergeometric series, 243 
Boas, Ralph Philip, Jr., 600, 606, 636, viii 
Boggs, Wade Anthony, 195 

Bohl, Piers Paul Felix [= Bol’, Pirs Georgievich], 
87, 606 

Bois-Reymond, Paul David Gustav du, 440, 610, 
617 

Boncompagni, Prince Baldassarre, 613 
bootstrapping, 463-466 

to estimate nth prime, 456-457 
Borchardt, Carl Wilhelm, 617 
Borel, Emile Felix Edouard Justin, 606, 636 
Borwein, Jonathan Michael, 606, 635 
Borwein, Peter Benjamin, 606, 635 
bound variables, 22 
boundary conditions on sums, 
can be difficult, 75, 86 
made easier, 24-25, 159 
bowling, 6 

box principle, 95, 130, 512 
bracket notation, 

for coefficients, 197, 331 
for true/false values, 24-25 
Brahma, Tower of, 1, 4, 278 
Branges, Louis de, 617 
Brent, Richard Peirce, 306, 525, 564, 606 
bricks, 313, 374 

Brillhart, John David, 606, 633 
Brocot, Achille, 116, 607 
Broder, Andrei Zary, 632, ix 
Brooke, Maxey, 607, 635 
Brousseau, Brother Alfred, 607, 633 
Brown, Mark Robbin, 632 
Brown, Morton, 501, 607 
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Brown, Roy Howard, ix 
Brown, Thomas Craig, 607, 633 
Brown, Trivial, 607 
Brown, William Gordon, 607 
Brown University, ix 
Browning, Elizabeth Barrett, 320 
Bruijn, Nicolaas Govert de, 444, 447, 500, 609, 
635, 636 
cycle, 500 
bubblesort, 448 
Buckholtz, Thomas Joel, 620 
Bulwer-Lytton, Edward George Earle Lytton, 
Baron, v 

Burma- Shave, 541 

Burr, Stefan Andrus, 607, 635 

calculators, 67, 77, 459 
failure of, 344 
calculus, vi, 33 

finite and infinite, 47-56 
candy, 36 

Canfield, Earl Rodney, 602, 607, 636 
cards, 

shuffling, 437 

stacking, 273-274, 280, 309 
Carlitz, Leonard, 607, 635 
Carroll, Lewis (= Dodgson, Rev. Charles 
Lutwidge), 31, 293, 607, 608, 630 
carries, 

across the decimal point, 70 
in divisibility of m T ^ n , 245, 536 
in Fibonacci number system, 297, 561 
Cassini, Jean Dominique, 292, 607 
identity, 292-293, 300 
identity, converse, 314 
identity, generalized, 303, 310 
Catalan, Eugene Charles, 203, 361, 607 
Catalan numbers, 203 

combinatorial interpretations, 358-360, 

565, 568 
generalized, 361 
in sums, 181, 203, 317 
table of, 203 

Cauchy, Augustin Louis, 607, 633 
Cech, Eduard, vi 
ceiling function, 67-69 
converted to floor, 68, 96 
graph of, 68 

center of gravity, 273-274, 309 
certificate of correctness, 104 


Chace, Arnold Buffum, 608, 633 
Chaimovich, Mark, 608 
chain rule, 54, 483 
change, 327-330, 374 

large amounts of, 344-346, 492 
changing the index of summation, 30-31, 39 
changing the tails of a sum, 466-469 
cheating, viii, 195, 388, 401 
not, 158, 323 

Chebyshev, Pafnutii L’vovich, 38, 145, 608, 633 
inequality, 390-391, 428, 430 
monotonic inequalities, 38, 576 
cheese slicing, 19 
Chen, Pang-Chieh, 632 
Chinese Remainder Theorem, 126, 146 
Chu Shih-Chieh [= Zhu Shijie], 169 
Chung, Fan-Rong King, ix, 608, 635 
Clausen, Thomas, 608, 634, 635 
product identities, 253 
clearly, clarified, 417-418, 581 
cliches, 166, 324, 357 
closed form, 3, 7, 321 

for generating functions, 317 
not, 108, 573 
pretty good, 346 
closed interval, 73-74 
Cobb, Tyrus Raymond, 195 
coefficient extraction, 197, 331 
Cohen, Henri Jose, 238 
coins, 327-330 
biased, 401 
fair, 401, 430 

flipping, 401-410, 430-432, 437-438 
spinning, 401 

Collingwood, Stuart Dodgson, 608 
Collins, John, 624 

Colombo, Cristoforo (= Columbus, Christo- 
pher), 74 
coloring, 496 
Columbia University, ix 
combinations, 153 
common logarithm, 449 
commutative law, 30, 61, 64 
failure of, 322, 502, 551 
relaxed, 31 
complete graph, 368 
complex factorial powers, 211 
complex numbers, 64 

roots of unity, 149, 204, 375, 553, 574, 598 
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composite numbers, 105, 518 
composition of generating functions, 428 
computer algebra, 42, 268, 501, 539 
Comtet, Louis, 609, 636 
Concrete Math Club, 74 
concrete mathematics, defined, vi 
conditional convergence, 59 
conditional probability, 416-419, 424-425 
confluent hypergeometric series, 206, 245 
congruences, 123-126 
Connection Machine, 131 
contiguous hypergeometrics, 529 
continuants, 301-309, 501 
and matrices, 318-319 
Euler’s identity for, 303, 312 
zero parameters in, 314 
continued fractions, 301, 304-309, 319 

large partial quotients of, 553, 563, 564, 602 
convergence, 

absolute, 60-62, 64 
conditional, 59 

of power series, 206, 331, 451, 532 
convex regions, 5, 20, 497 
convolution, 197, 246, 333, 353-364 
binomial, 365, 367 
identities for, 202, 272, 373 
polynomials, 373 
Stirling, 272, 290 

Vandermonde, see Vandermonde convolution 
Conway, John Horton, 410, 609 
cotangent function, 286, 317 
counting, 

combinations, 153 
cycle arrangements, 259-262 
derangements, 193-196, 199-200 
integers in intervals, 73-74 
necklaces, 139-141 
parenthesized formulas, 357-359 
permutations, 111 
permutations by ascents, 267-268 
permutations by cycles, 262 
set partitions, 258-259 
spanning trees, 348-350, 356, 368-369, 374 
with generating functions, 320-330 
coupon collecting, 583 
Cover, Thomas Merrill, 636 
Coxeter, Harold Scott Macdonald, 605 
Cramer, Carl Harald, 525, 609, 634 
Cray X-MP, 109 


Crelle, August Leopold, 609, 633 
cribbage, 65 

Crispin, Mark Reed, 628 
Crowe, Donald Warren, 609, 633 
crudification, 447 
Csirik, Janos Andras, 590, 609 
cubes, sum of consecutive, 51, 63, 283, 289, 367 
cumulants, 397-401 
infinite, 576 

of binomial distribution, 432 
of discrete distribution, 438 
of Poisson distribution, 428-429 
third and fourth, 429, 579, 589 
CUNY (= City University of New York), ix 
Curtiss, David Raymond, 609, 634 
cycles, 

de Bruijn, 500 
of beads, 139-140 
of permutations, 259-262 
cyclic shift, 12 
cyclotomic polynomials, 149 

D, see derivative operator 
Dating Game, 506 

David, Florence Nightingale, 602, 609 
Davis, Philip Jacob, 609 
Davison, John Leslie, 307, 604, 609, 635 
de Branges, Louis, 617 

de Bruijn, Nicolaas Govert, 444, 447, 500, 609, 
635, 636 
cycle, 500 

de Finetti, Bruno, 24, 613 
de Lagny, Thomas Fantet, 304, 621 
de Moivre, Abraham, 297, 481, 609 
Dedekind, Julius Wilhelm Richard, 136-137, 609 
definite sums, analogous to definite integrals, 
49-50 

deg, 226, 232 

degenerate hyper geometric series, 209-210, 216, 
222, 247 

derangements, 194-196 

generating function, 199-200 
derivative operator, 47-49 

converting between D and A, 470-471 
converting between D and D, 310 
with generating functions, 33, 333, 364-365 
with hypergeometric series, 219-221 
descents, see ascents 
dgf: Dirichlet generating function, 370 
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dice, 381-384 

fair, 382, 417, 429 
loaded, 382, 429, 431 
nonstandard, 431 
pgf for, 399-400 
probability of doubles, 427 
supposedly fair, 392 
Dickson, Leonard Eugene, 510, 609 
Dieudonne, Jean Alexandre, 523 
difference operator, 47-55, 241 

converting between D and A, 470-471 
nth difference, 187-192, 280-281 
nth difference of product, 571 
differentiably finite power series, 374, 380 
differential operators, see derivative operator, 
theta operator 

difficulty measure for summation, 181 
Dijkstra, Edsger Wybe, 173, 609, 635 
dimers and dimes, 320, see dominoes and 
change 

diphages, 434, 438 

Dirichlet, Peter Gustav Lejeune, 370, 610, 633 
box principle, 95, 130, 512 
generating functions, 370-371, 373, 432, 451 
probability generating functions, 432 
discrepancy, 88-89, 97 

and continued fractions, 319, 492, 602 
asymptotics of, 492, 495 
discrete probability, 381-438 
defined, 381 
disease, 333 
distribution, 

of fractional parts, 87 
of primes, 111 

of probabilities, see probability distributions 
of things into groups, 83-85 
distributive law, 30, 35, 60, 64 
for gcd and 1cm, 145 
for mod, 83 
divergent sums, 57, 60 

considered useful, 346-348, 451 
illegitimate, 504, 532 
divide and conquer, 79 
divides exactly, 146 

in binomial coefficients, 245 
in factorials, 112-114, 146 
divisibility, 102-105 
by 3, 147 

of polynomials, 225 


Dixon, Alfred Cardew, 610, 634 
formula, 214 
DNA, Martian, 377 

Dodgson, Charles Lutwidge, see Carroll 
domino tilings, 320-327, 371, 379 
ordered pairs of, 375 
Dorothy Gale, 581 

double generating functions, see super generat- 
ing functions 

double sums, 34-41, 246, 249 
considered useful, 46, 183-185 
faulty use of, 63, 65 
infinite, 61 
over divisors, 105 
telescoping, 255 
doubloons, 436-437 

doubly exponential recurrences, 97, 100, 

101, 109 

doubly infinite sums, 59, 98, 482-483 
Dougall, John, 171, 610 
downward generalization, 2, 95, 320-321 
Doyle, Sir Arthur Conan, 162, 228-229, 405, 610 
drones, 291 

Drysdale, Robert Lewis (Scot), III, 632 
du Bois-Reymond, Paul David Gustav, 440, 610, 
617 

duality, 69 

between £ and 1/n n ~’ , 530 
between factorial and Gamma functions, 211 
between floors and ceilings, 68-69, 96 
between gcd and 1cm, 107 
between rising and falling powers, 63 
between Stirling numbers of different kinds, 
267 

Dubner, Harvey, 610, 631, 633 
Dudeney, Henry Ernest, 610, 633 
Dunkel, Otto, 614, 633 
Dunn, Angela Pox, 627, 635 
Dunnington, Guy Waldo, 610 
duplication formulas, 186, 244 
Dupre, Lyn Oppenheim, ix 
Durst, Lincoln Kearney, viii 
Dyson, Freeman John, 172, 239, 610, 615 

e (« 2.71828), 

as canonical constant, 70, 596 
representations of, 122, 150 
e n , see Euclid numbers 
E: expected value, 385-386 
E: shift operator, 55, 188, 191 
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E n , see Euler numbers 
Edwards, Anthony William Pairbank, 610 
eeny-meeny-miny-mo, see Josephus problem 
efficiency, different notions of, 24, 133 
egf: exponential generating function, 364 
eggs, 158 

Egyptian mathematics, 95, 150 
bibliography of, 608 
Einstein, Albert, 72, 307 
Eisele, Carolyn, 624-625 

Eisenstein, Ferdinand Gotthold Max, 202, 610 

Ekhad, Shalosh B, 546 

elementary events, 381-382 

Elkies, Noam David, 131, 610 

ellipsis (■■■), 21 

advantage of, 21, 25, 50 
disadvantage of, 25 
elimination of, 108 
empirical estimates, 391-393, 427 
empty case, 

for spanning trees, 349, 565 
for Stirling numbers, 258 
for tilings, 320-321 
for Tower of Hanoi, 2 
empty product, 48, 106, 111 
empty sum, 24, 48 
entier function, see floor function 
equality, one-way, 446-447, 489-490 
equivalence relation, 124 
Eratosthenes, sieve of, 111 
Erdelyi, Arthur, 629, 636 
Erdos, Pal (= Paul), 418, 525, 548, 575, 
610-611, 634, 636 
error function, 166 

errors, absolute versus relative, 452, 455 
errors, locating our own, 183 
Eswarathasan, Arulappah, 611, 635 
Euclid (= EfinXeCSw), 107-108, 147, 611 
algorithm, 103-104, 123, 303-304 
numbers, 108-109, 145, 147, 150, 151 
Euler, Leonhard, i, vii, ix, 6, 48, 122, 132-134, 
202, 205, 207, 210, 267, 277, 278, 286, 299, 
301-303, 469, 471, 513, 529, 575, 603, 605, 
609, 611-613, 629, 633-636 
constant (« 0.57722), 278, 306-307, 319, 

481, 596 

disproved conjecture, 131 
identity for continuants, 303, 312 
identity for hypergeometrics, 244 


numbers, 559, 570, 620; see also Eulerian 
numbers 

polynomials, 574 
pronunciation of name, 147 
summation formula, 469-475 
theorem, 133, 142, 147 
totient function, see phi function 
triangle, 268, 316 

Eulerian numbers, 267-271, 310, 316, 378, 574 
combinatorial interpretations, 267-268, 557 
generalized, 313 
generating function for, 351 
second-order, 270-271 
table of, 268 
event, 382 

eventually positive function, 442 
exact cover, 376 
exactly divides, 146 

in binomial coefficients, 245 
in factorials, 112-114, 146 
excedances, 316 

exercises, levels of, viii, 72-73, 95, 511 
exp: exponential function, 455 
expectation, see expected value 
expected value, 385-387 
using a pgf, 395 

exponential function, discrete analog of, 54 
exponential generating functions, 364-369, 
421-422 

exponential series, generalized, 200-202, 242, 
364, 369 

exponents, laws of, 52, 63 

F, see hypergeometric functions 
F n , see Fibonacci numbers 

factorial expansion of binomial coefficients, 156, 
211 

factorial function, 111-115, 346-348 

approximation to, see Stirling’s approxima- 
tion 

duplication formula, 244 
generalized to nonintegers, 192, 210-211, 
213-214, 316 

factorial powers, see falling factorial powers, 
rising factorial powers 
factorization into primes, 106-107, 110 
factorization of summation conditions, 36 
fair coins, 401, 430 
fair dice, 382, 417 
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falling factorial powers, 47 
binomial theorem for, 245 
complex, 211 
difference of, 48, 53, 188 
negative, 52, 63, 188 

related to ordinary powers, 51, 262-263, 598 
related to rising powers, 63, 312 
summation of, 50-53 
fans, ix, 193, 348 
Parey, John, series, 118-119, 617 

consecutive elements of, 118-119, 150 
distribution of, 152 

enumeration of, 134, 137-139, 462-463 
Paulhaber, Johann, 288, 613, 620 
Peder, Tomas, 635 
Peigenbaum, Joan, 632 
Feller, William, 381, 613, 636 
Permat, Pierre de, 130, 131, 613 
numbers, 131-132, 145, 525 
Fermat’s Last Theorem, 130-131, 150, 524, 555 
Fermat’s theorem (= Fermat’s Little Theorem), 
131-133, 141-143, 149 
converse of, 132, 148 

Fibonacci, Leonardo, 95, 292, 549, 613, 633, 634 
addition, 296-297, 317 
algorithm, 95, 101 
factorial, 492 
multiplication, 561 

number system, 296-297, 301, 307, 310, 318 
odd and even, 307-308 
Fibonacci numbers, 290-301, 575 
and continuants, 302 
and sunflowers, 291 
closed forms for, 299-300, 331 
combinatorial interpretations of, 291-292, 

302, 321, 549 
egf for, 570 

ordinary generating functions for, 297-300, 
337-340, 351 
second-order, 375 
table of, 290, 293 
Fibonomial coefficients, 318, 556 
Fine, Henry Burchard, 625 
Fine, Nathan Jacob, 603 
Finetti, Bruno de, 24, 613 
finite calculus, 47-56 
finite state language, 405 
Finkel, Raphael Ari, 628 
Fisher, Michael Ellis, 613, 636 


Fisher, Sir Ronald Aylmer, 613, 636 
fixed points, 12, 393-394 
pgf for, 400-401, 428 
flipping coins, 401-410, 430-432, 437-438 
floor function, 67-69 

converted to ceiling, 68, 96 
graph of, 68 

Floyd, Robert W, 634, 635 

food, see candy, cheese, eggs, pizza, sherry 

football, 182 

football victory problem, 193-196, 199-200, 428 
generalized, 429 

mean and variance, 393-394, 400-401 
Forcadel, Pierre, 613, 634 
formal power series, 206, 331, 348, 532 
FORTRAN, 446 

Fourier, Jean Baptiste Joseph, 22, 613 
series, 495 
fractional parts, 70 

in Euler’s summation formula, 470 
in polynomials, 100 
related to mod, 83 
uniformly distributed, 87 
fractions, 116-123 
basic, 134, 138 

continued, 301, 304-309, 319, 564 
partial, see partial fraction expansions 
unit, 95, 150 
unreduced, 134-135, 151 
Praenkel, Aviezri S, 515, 563, 613-614, 633 
Frame, James Sutherland, 614, 633 
Francesca, Piero della, 614, 635 
Franel, Jerome, 614 
number, 549 

Fraser, Alexander Yule, 2, 604 
Frazer, William Donald, 614, 634 
Predman, Michael Lawrence, 513, 614 
free variables, 22 
Preiman, Grigori! Abelevich, 608 
friendly monster, 545 
frisbees, 434-435, 437 
Frye, Roger Edward, 131 
Fundamental Theorem of Algebra, 207 
Fundamental Theorem of Arithmetic, 106-107 
Fundamental Theorem of Calculus, 48 
Fuss, Nicola! Ivanovich, 361, 614 
Fuss-Catalan numbers, 361 
Fuss, Paul Heinrich von [= Fus, Pavel Nikolae- 
ich], 611-612 
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Gale, Dorothy, 581 

games, see bowling, cards, cribbage, dice, 

Penny ante, sports 
Gamma function, 210-214, 609 
duplication formula for, 528 
Stirling’s approximation for, 482 
gaps between primes, 150-151, 525 
Gardner, Martin, 614, 634, 636 
Garfunkel, Jack, 614, 636 
Gasper, George, Jr., 223, 614 
Gaufi (= Gauss), Karl (= Carl) Friedrich, vii, 

6, 7, 123, 205, 207, 212, 501, 510, 529, 610, 
615, 633, 634 

hyper geometric series, 207 
identity for hypergeometrics, 222, 247, 539 
trick, 6, 30, 112, 313 
gcd, 103, see greatest common divisor 
generalization, 11, 13, 16 
downward, 2, 95, 320-321 
generalized binomial coefficients, 211, 318, 530 
generalized binomial series, 200-204, 243, 

252, 363 

generalized exponential series, 200-202, 242, 

364, 369 

generalized factorial function, 192, 210-211, 
213-214, 316 

generalized harmonic numbers, 277, 283, 286, 370 
generalized Stirling numbers, 271-272, 311, 316, 
319, 598 

generating functions, 196-204, 297-300, 320-380 
composition of, 428 
Dirichlet, 370-371, 373, 432, 451 
exponential, 364-369, 421-422 
for Bernoulli numbers, 285, 351, 365 
for convolutions, 197, 333-334, 353-364, 

369, 421 

for Eulerian numbers, 351, 353 

for Fibonacci numbers, 297-300, 337-340, 

351, 570 

for harmonic numbers, 351-352 

for minima, 377 

for probabilities, 394-401 

for simple sequences, 335 

for special numbers, 351-353 

for spectra, 307, 319 

for Stirling numbers, 351-352, 559 

Newtonian, 378 

of generating functions, 351, 353, 421 

super, 353, 421 

table of manipulations, 334 


Genocchi, Angelo, 615 
numbers, 551, 574 
geometric progression, 32 
floored, 114 
generalized, 205-206 
sum of, 32-33, 54 
Gessel, Ira Martin, 270, 615, 634 
Gibbs, Josiah Willard, 630 
Gilbert, William Schwenck, 444 
Ginsburg, Jekuthiel, 615 
Glaisher, James Whitbread Lee, 615, 636 
constant (« 1.28243), 595 
God, 1, 307, 521 
Goldbach, Christian, 611-612 
theorem, 66 

golden ratio, 299, see phi 
golf, 431 

Golomb, Solomon Wolf, 460, 507, 615, 629, 633 
digit-count sum, 460-462, 490 (exercise 22), 
494 

self-describing sequence, 66, 495 
Good, Irving John, 615, 634 
Goodfellow, Geoffrey Scott, 628 
Gopinath, Bhaskarpillai, 501, 621 
Gordon, Peter Stuart, ix 

Gosper, Ralph William, Jr., 224, 564, 615, 634 
algorithm, 224-227 

algorithm, examples, 227-229, 245, 247-248, 
253-254, 534 

Gosper- Zeilberger algorithm, 229-241, 319 
examples, 254-255, 547 
summary, 233 

goto, considered harmful, 173 
Gottschalk, Walter Helbig, vii 
graffiti, vii, ix, 59, 637 
Graham, Cheryl, ix 

Graham, Ronald Lewis, iii, iv, vi, ix, 102, 506, 
608-609, 611, 615-616, 629, 632, 633, 635 
Grandi, Luigi Guido, 58, 616 
Granville, Andrew James, 548 
graph theory, see spanning trees 
graphs of functions, 

1/x, 262-263 

e -x 2 /10, 4g3 

Bernoulli polynomials, 473 
floor and ceiling, 68 
hyperbola, 440 

partial sums of a sequence, 345-346 
Graves, William Henson, 632 
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gravity, center of, 273-274, 309 
Gray, Prank, code, 497 

greatest common divisor, 92, 103-104, 107, 145 
greatest integer function, see floor function 
greatest lower bound, 65 
greed, 74, 387-388; see also rewards 
greedy algorithm, 101, 295 
Green, Research Sink, 607 
Greene, Daniel Hill, 616 
Greitzer, Samuel Louis, 616, 633 
Gross, Oliver Alfred, 616, 635 
Griinbaum, Branko, 498, 616 
Grundy, Patrick Michael, 627, 633 
Guibas, Leonidas Ioannis (= Leo John), 590, 
616, 632, 636 

Guy, Richard Kenneth, 523, 525, 616 

H n , see harmonic numbers 
Haar, Alfred, vii 
Hacker’s Dictionary, 124, 628 
Haiman, Mark, 632 
Haland, Inger Johanne, 616, 633 
half-open interval, 73-74 
Hall, Marshall, Jr., 616 
Halmos, Paul Richard, v, vi, 616-617 
Halphen, Georges Henri, 305, 617 
halving, 79, 186-187 
Hamburger, Hans Ludwig, 591, 617 
Hammersley, John Michael, v, 617, 636 
Hanoi, Tower of, 1-4, 26-27, 109, 146 
variations on, 17-20 
Hansen, Eldon Robert, 42, 617 
Hardy, Godfrey Harold, 111, 442-443, 617, 

633, 636 

harmonic numbers, 29, 272-282 
analogous to logarithms, 53 
asymptotics of, 276-278, 452, 480-481, 491 
complex, 311, 316 
divisibility of, 311, 314, 319 
generalized, 277, 283, 286, 370 
generating function for, 351-352 
second-order, 277, 280, 311, 550-552 
sums of, 41, 313, 316, 354-355 
sums using summation by parts, 56, 279-282, 
312 

table of, 273 

harmonic series, divergence of, 62, 275-276 
Harry, Matthew Arnold, double sum, 249 
hashing, 411-426, 430 
hats, see football victory problem 


hcf, 103, see greatest common divisor 
Heath-Brown, David Rodney, 629 
Heiberg, Johan Ludvig, 611 
Heisenberg, Werner Karl, 481 
Helmbold, David Paul, 632 
Henrici, Peter Karl Eugen, 332, 545, 602, 617, 
634, 636 

Hermite, Charles, 538, 555, 617, 628, 634 
herring, red, 497 
Herstein, Israel Nathan, 8, 618 
hexagon property, 155-156, 242, 251 
highest common factor, see greatest common 
divisor 

Hillman, Abraham P, 618, 634 

Hoare, Charles Antony Richard, 28, 73, 618, 620 

Hofstadter, Douglas Richard, 633 

Hoggatt, Verner Emil, Jr., 618, 622, 634 

Holden, Edward Singleton, 624 

Holmboe, Berndt Michael, 604 

Holmes, Thomas Sherlock Scott, 162, 228-229 

holomorphic functions, 196 

homogeneous linear equations, 239, 543 

horses, 17, 18, 468, 503 

Hsu, Lee-Tsch (= Lietz = Leetch) Ching-Siur, 
618, 634 

Hurwitz, Adolf, 635 
hyperbola, 440 

hyperbolic functions, 285-286 
hyperfactorial, 243, 491 
hyper geometric series, 204-223 
confluent, 206, 245 
contiguous, 529 

degenerate, 209-210, 216, 222, 247 
differential equation for, 219-221 
Gaussian, 207 

partial sums of, 165-166, 223-230, 224, 245 
transformations of, 216-223, 247, 253 
hyper geometric terms, 224, 243, 245, 527, 575 
similar, 541 

i, 22 

implicit recurrences, 136-139, 193-195, 284 
indefinite summation, 48-49 
by parts, 54-56 

of binomial coefficients, 161, 223-224, 246, 
248, 313 

of hyper geometric terms, 224-229 



INDEX 647 


independent random variables, 384, 427 
pairwise, 437 
products of, 386 
sums of, 386, 396-398 
index set, 22, 30, 61 
index variable, 22, 34, 60 
induction, 3, 7, 10-11, 43 
backwards, 18 
basis of, 3, 320-321 
failure of, 17, 575 
important lesson about, 508, 549 
inductive leap, 4, 43 
infinite sums, 56-62, 64 
doubly, 59, 98, 482-483 
information retrieval, 411-413 
INT function, 67 
insurance agents, 391 
integer part, 70 
integration, 45-46, 48 
by parts, 54, 472 
of generating functions, 333, 365 
interchanging the order of summation, 34-41, 
105, 136, 183, 185, 546 
interpolation, 191-192 
intervals, 73-74 
invariant relation, 117 
inverse modulo m, 125, 132, 147 
inversion formulas, 193 

for binomial coefficients, 192-196 
for Stirling numbers, 264, 310 
for sums over divisors, 136-139 
irrational numbers, 238 

continued fraction representations, 306 
rational approximations to, 122-123 
spectra of, 77, 96, 514 
Stern-Brocot representations, 122-123 
Iverson, Kenneth Eugene, 24, 67, 618, 633 
convention, 24-25, 31, 34, 68, 75 

Jacobi, Carl Gustav Jacob, 64, 618 
polynomials, 543, 605 
Janson, Carl Svante, 618 
Jarden, Dov, 556, 618 
Jeopardy, 361 
joint distribution, 384 
Jonassen, Arne Tormod, 618 
Jones, Bush, 618 

Josephus, Flavius, 8, 12, 19-20, 618 
numbers, 81, 97, 100 
problem, 8-17, 79-81, 95, 100, 144 


recurrence, generalized, 13-16, 79-81, 498 
subset, 20 

Jouaillec, Louis Maurice, 632 
Jungen, Reinwald, 618, 635 

K, see continuants 
Kafkaesque scenario, 274 
Kaplansky, Irving, 8, 618 
Karamata, Jovan, 257, 618 
Karlin, Anna Rochelle, 632 
Kaucky, Josef, 618, 635 
Keiper, Jerry Bruce, 619 
Kellogg, Oliver Dimon, 609 
Kent, Clark (= Kal-El), 372 
kernel functions, 370 
Ketcham, Henry King, 148 
kilometers, 301, 310, 550 
Kilroy, James Joseph, vii 
Kipling, Joseph Rudyard, 260 
Kissinger, Henry Alfred, 379 
Klamkin, Murray Seymour, 619, 633, 635 
Klarner, David Anthony, 632 
knockout tournament, 432-433 
Knoebel, Robert Arthur, 619 
Knopp, Konrad, 619, 636 
Knuth, Donald Ervin, iii-vi, viii, ix, 102, 267, 
411, 506, 553, 616, 618-620, 632, 633, 

636, 657 

numbers, 78, 97, 100 
Knuth, John Martin, 636 
Knuth, Nancy Jill Carter, ix 
Kramp, Christian, 111, 620 
Kronecker, Leopold, 521 
delta notation, 24 
Kruk, John Martin, 519 

Kummer, Ernst Eduard, 206, 529, 620-621, 634 
formula for hyper geometries, 213, 217, 535 
Kurshan, Robert Paul, 501, 621 

L n , see Lucas numbers 
Lagny, Thomas Fantet de, 304, 621 
Lagrange (= de la Grange), Joseph Louis, 
comte, 470, 621, 635 
identity, 64 
Lah, Ivo, 621, 634 

Lambert, Johann Heinrich, 201, 363, 613, 621 
Landau, Edmund Georg Hermann, 443, 448, 

621, 634, 636 

Laplace, Pierre Simon, marquis de, 466, 606, 621 
last but not least, 132, 469 
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Law of Large Numbers, 391 
lcm, 103, see least common multiple 
leading coefficient, 235 
least common multiple, 103, 107, 145 
of 251, 319, 500 

least integer function, see ceiling function 
least upper bound, 57, 61 
LeChiffre, Mark Well, 148 
left-to-right maxima, 316 
Legendre, Adrien Marie, 621, 633 
polynomials, 543, 573, 575 
Lehmer, Derrick Henry, 526, 622, 633, 635 
Leibniz, Gottfried Wilhelm, Freiherr von, vii, 
168, 616, 622 

Lekkerkerker, Cornelius Gerrit, 622 
Lengyel, Tamas Lorant, 622, 635 
levels of problems, viii, 72-73, 95, 511 
Levine, Eugene, 611, 635 
lexicographic order, 441 
lg: binary logarithm, 70 
L’Hospital, Guillaume Frangois Antoine de, 
marquis de Sainte Mesme, rule, 340, 

396, 542 

Li Shanlan Renshu [= Qiuren], 269, 622 

Liang, Franklin Mark, 632 

Lieb, Elliott Hershel, 622, 636 

lies, and statistics, 195 

Lincoln, Abraham, 401 

linear difference operators, 240 

lines in the plane, 4-8, 17, 19 

Liouville, Joseph, 136-137, 622 

little oh notation, 448 

considered harmful, 448-449 
Littlewood, John Edensor, 239 
In: natural logarithm, 276 
discrete analog of, 53-54 
sum of, 481-482 
log: common logarithm, 449 
Logan, Benjamin Franklin (= Tex), Jr., 287, 
622, 634-635 

logarithmico-exponential functions, 442-443 
logarithms, 449 
binary, 70 

discrete analog of, 53-54 
in O-notation, 449 
natural, 276 

Long, Calvin Thomas, 622, 634 
lottery, 387-388, 436-437 
Lou Shituo, 622 


lower index of binomial coefficient, 154 
complex valued, 211 

lower parameters of hypergeometric series, 205 
Loyd, Samuel, 560, 622 
Lucas, Frangois Edouard Anatole, 1, 292, 
622-623, 633-635 
numbers, 312, 316, 556 
Luczak, Tomasz Jan, 618 
Lyness, Robert Cranston, 501, 623 

Maclaurin, Colin, 469, 623 
MacMahon, Maj. Percy Alexander, 140, 623 
magic tricks, 293 
Mallows, Colin Lingwood, 506 
Markov, Andrei Andreevich (the elder), pro- 
cesses, 405 
Martian DNA, 377 
Martzloff, Jean-Claude, 623 
mathematical induction, 3, 7, 10-11, 43 
backwards, 18 
basis of, 3, 320-321 
failure of, 17, 575 
important lesson about, 508, 549 
Mathews, Edwin Lee (= 41), 8, 21, 94, 105, 
106, 343 

Matiiasevich (= Matijasevich), Iurii (= Yuri) 
Vladimirovich, 294, 623, 635 
Mauldin, Richard Daniel, 611 
Maxfield, Margaret Waugh, 630, 635 
Mayr, Ernst, ix, 632, 633 
McEliece, Robert James, 71 
McGrath, James Patrick, 632 
McKellar, Archie Charles, 614, 634 
mean (average) of a probability distribution, 
384-399 

median, 384, 385, 437 
mediant, 116 

Melzak, Zdzislaw Alexander, vi, 623 
Mendelsohn, Nathan Saul, 623, 634 
Merchant, Arif Abdulhussein, 632 
merging, 79, 175 

Mersenne, Marin, 109-110, 131, 613, 623 
numbers, 109-110, 151, 292 
primes, 109-110, 127, 522-523 
Mertens, Franz Carl Joseph, 139, 623 
constant, 23 
miles, 301, 310, 550 
Mills, Stella, 623 
Mills, William Harold, 623, 634 
minimum, 65, 249, 377 
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Mirsky, Leon, 635 

mixture of probability distributions, 428 
mnemonics, 74, 164 

Mobius, August Ferdinand, 136, 138, 623 

function, 136-139, 145, 149, 370-371, 462-463 
mod: binary operation, 81-85 
mod: congruence relation, 123-126 
mod 0, 82-83, 515 
mode, 384, 385, 437 
modular arithmetic, 123-129 
modulus, 82 

Moessner, Alfred, 624, 636 
Moivre, Abraham de, 297, 481, 609 
moments, 398-399 
Montgomery, Hugh Lowell, 463, 624 
Montgomery, Peter Lawrence, 624, 634 
Moriarty, James, 162 

Morse, Samuel Finley Breese, code, 302-303, 

324, 551 

Moser, Leo, 624, 633 

Motzkin, Theodor Samuel, 556, 564, 618, 624 
mountain ranges, 359, 565 
mu function, see Mobius function 
multinomial coefficients, 168, 171-172, 569 
recurrence for, 252 
multinomial theorem, 149, 168 
multiple of a number, 102 

multiple sums, 34-41, 61; see also double sums 
multiple-precision numbers, 127 
multiplicative functions, 134-136, 144, 371 
multisets, 77, 270 

mumble function, 83, 84, 88, 507, 513 
Murdock, Phoebe James, viii 
Murphy’s Law, 74 
Myers, Basil Roland, 624, 635 

name and conquer, 2, 32, 88, 139 
National Science Foundation, ix 
natural logarithm, 53-54, 276, 481-482 
Naval Research, ix 
Navel research, 299 
nearest integer, 95 

rounding to, 195, 300, 344, 491 
unbiased, 507 

necessary and sufficient conditions, 72 
necklaces, 139-141, 259 
negating the upper index, 164-165 
negative binomial distribution, 402-403, 428 
negative factorial powers, 52, 63, 188 
Newman, James Roy, 630 


Newman, Morris, 635 
Newton, Sir Isaac, 189, 277, 624 
series, 189-192 

Newtonian generating function, 378 
Niven, Ivan Morton, 332, 624, 633 
nonprime numbers, 105, 518 
nontransitive paradox, 410 
normal distribution, 438 
notation, x-xi, 2, 637 

extension of, 49, 52, 154, 210-211, 266, 271, 
311, 319 
ghastly, 67, 175 
need for new, 83, 115, 267 
nu function: sum of digits, 

binary (radix 2), 12, 114, 250, 525, 557 
other radices, 146, 525, 552 
null case, 

for spanning trees, 349, 565 
for Stirling numbers, 258 
for tilings, 320-321 
for Tower of Hanoi, 2 
number system, 107, 119 
binomial, 245 

Fibonacci, 296-297, 301, 307, 310, 318 
prime-exponent, 107, 116 
radix, see radix notation 
residue, 126-129, 144 
Stern-Brocot, see Stern-Brocot number 
system 

number theory, 102-152 

o, considered harmful, 448-449 
O-notation, 76, 443-449 
abuse of, 447-448, 489 
one-way equalities with, 446-447, 489-490 
obvious, clarified, 417, 526 
odds, 410 

Odlyzko, Andrew Michael, 81, 564, 590, 616, 
624, 636 

Office of Naval Research, ix 
one-way equalities, 446-447, 489-490 
open interval, 73-74, 96 
operators, 47 

anti-derivative (J), 48 
anti-difference (5~), 48 
derivative (D), 47, 310 
difference (A), 47 

equations of, 188, 191, 241, 310, 471 
shift (E, K, N), 55, 240 
theta (ff), 219, 310 
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optical illusions, 292, 293, 560 
organ-pipe order, 524 
Oz, Wizard of, 581 

Pacioli, Luca, 614 

Palais, Richard Sheldon, viii 

paradoxes, 

chessboard, 293, 317 
coin flipping, 408-410 
pair of boxes, 531, 535, 539 
paradoxical sums, 57 
parallel summation, 159, 174, 208-210 
parentheses, 357-359 
parenthesis conventions, xi 
partial fraction expansions, 298-299, 338-341 
for easy summation and differentiation, 64, 
376, 476, 504, 586 
not always easiest, 374 
of 1/x *+ n , 189 
of l/(z n - 1), 558 
powers of, 246, 376 
partial quotients, 306 

and discrepancies, 319, 598-599, 602 
large, 553, 563, 564, 602 
partial sums, see indefinite summation 
required to be positive, 359-362 
partition into nearly equal parts, 83-85 
partitions, of the integers, 77-78, 96, 99, 101 
of a number, 330, 377 
of a set, 258-259, 373 
Pascal, Blaise, 155, 156, 624, 633 
Pascal’s triangle, 155 
extended upward, 164 
hexagon property, 155-156, 242, 251 
row lcms, 251 
row products, 243 
row sums, 163, 165-166 
variant of, 250 

Patashnik, Amy Markowitz, ix 
Patashnik, Oren, iii, iv, vi, ix, 102, 506, 616, 632 
Patil, Ganapati Parashuram, 624, 636 
Paule, Peter, 537, 546 

Peirce, Charles Santiago Sanders, 525, 624-625, 
634 

sequence, 151 

Penney, Walter Francis, 408, 625 
Penney ante, 408-410, 430, 437, 438 
pentagon, 314 (exercise 46), 430, 434 
pentagonal numbers, 380 
Percus, Jerome Kenneth, 625, 636 


perfect powers, 66 
periodic recurrences, 20, 179, 498 
permutations, 111-112 
ascents in, 267-268, 270 
cycles in, 259-262 
excedances in, 314 

fixed points in, 193-196, 393-394, 400-401, 
418 

left-to-right maxima in, 314 
random, 393-394, 400-401, 428 
up-down, 377 

without fixed points, see derangements 
personal computer, 109 
perspiration, 234-235 

perturbation method, 32-33, 43-44, 64, 179, 
284-285 

Petkovsek, Marko, 229, 575, 625, 634 
Pfaff, Johann Friedrich, 207, 214, 217, 529, 625, 
634 

reflection law, 217, 247, 539 
pgf: probability generating function, 394 
phages, 434, 438 
phi (« 1.61803), 299-301 
as canonical constant, 70 
continued fraction for, 310 
in fifth roots of unity, 553 
in solutions to recurrences, 97, 99, 285-286 
Stern-Brocot representation of, 550 
phi function, 133-135 
dgf for, 371 
divisibility by, 151 

Phi function: sum of ()>, 137-139, 462-463 
Phidias, 299 

philosophy, vii, 11, 16, 46, 71, 72, 75, 91, 170, 
181, 194, 331, 467, 503, 508, 603 
phyllotaxis, 291 
pi (« 3.14159, 26, 286 

as canonical constant, 70, 416, 423 
large partial quotients of, 564 
Stern-Brocot representation of, 146 
pi function, 110-111, 452, 593 
preposterous expressions for, 516 
Pig, Porky, 496 
pigeonhole principle, 130 
Pincherle, Salvatore, 617 
Pisano, Leonardo, 613, see Fibonacci 
Pittel, Boris Gershon, 576, 618 
pizza, 4, 423 
planes, cutting, 19 
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pneumathics, 164 
Pochhammer, Leo, 48, 625 
symbol, 48 

pocket calculators, 67, 77, 459 
failure of, 344 

Poincare, Jules Henri, 625, 636 
Poisson, Simeon Denis, 471, 625 
distribution, 428-429, 579 
summation formula, 602 
Poliak, Henry Otto, 616, 633 
Polya, George (= Gyorgy), vi, 16, 327, 508, 625, 
633, 635, 636 
polygons, 

dissection of, 379 
triangulation of, 374 
Venn diagrams with, 20 
polynomial argument, 158, 163 
for rational functions, 527 
opposite of, 210 

polynomially recursive sequence, 374 
polynomials, 189 

Bernoulli, 367-368, 470-475 
continuant, 301-309 
convolution, 373 
cyclotomic, 149 
degree of, 158, 226 
divisibility of, 225 
Euler, 574 
Jacobi, 543, 605 
Legendre, 543, 573, 575 
Newton series for, 189-191 
reflected, 339 

Stirling, 271-272, 290, 311, 317, 352 
Poonen, Bjorn, 501, 633 
Porter, Thomas K, 632 

Portland cement, see concrete (in another book) 
power series, 196, see generating functions 
formal, 206, 331, 348, 532 
Pr, 381-382 

Pratt, Vaughan Ronald, 632 
preferential arrangements, 378 (exercise 44) 
primality testing, 110, 148 
impractical method, 133 
prime algebraic integers, 106, 147 
prime numbers, 105-111 
gaps between, 150-151, 525 
largest known, 109-110 
Mersenne, 109-110, 127, 522 
size of nth, 110-111, 456-457 
sum of reciprocals, 22-25 


prime to, 115 

prime-exponent representation, 107, 116 
Princeton University, ix, 427 
probabilistic analysis of an algorithm, 413-426 
probability, 195, 381-438 

conditional, 416-419, 424-425 
discrete, 381-438 
generating functions, 394-401 
spaces, 381 

probability distributions, 367 
binomial, 401-402, 415, 428, 432 
composition or mixture of, 428 
joint, 384 

negative binomial, 402-403, 428 
normal, 438 
Poisson, 428-429, 579 
uniform, 395-396, 420-421 
problems, levels of, viii, 72-73, 95, 511 
product notation, 64, 106 

product of consecutive odd numbers, 186, 270 
progression, see arithmetic progression, geomet- 
ric progression 
proof, 4, 7 

proper terms, 239-241, 255-256 
properties, 23, 34, 72-73 
prove or disprove, 71-72 
psi function, 551 

pulling out the large part, 453, 458 
puns, ix, 220 

Pythagoras of Samos, theorem, 510 

quadratic domain, 147 
quicksort, 28-29, 54 
quotation marks, xi 
quotient, 81 

rabbits, 310 

radix notation, 11-13, 15-16, 109, 195, 526 
length of, 70, 460 

related to prime factors, 113-114, 146-148, 

245 

Rado, Richard, 625, 635 

Rahman, Mizanur, 223, 614 

Rainville, Earl David, 529, 626 

Ramanujan Aiyangar, Srinivasa, 330 

Ramare, Olivier, 548 

Ramshaw, Lyle Harold, 73, 632, 634, 636 

random constant, 399 

random variables, 383-386; see also independent 
random variables 
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Raney, George Neal, 359, 362, 626, 635 
lemma, 359-360 
lemma, generalized, 362, 372 
sequences, 360-361 
Rao, Dekkata Rameswar, 626, 633 
rational functions, 207-208, 224-226, 338, 527 
rational generating functions, 338-346 
expansion theorems for, 340-341 
Rayleigh, John William Strutt, 3rd Baron, 

77, 626 

Read, Ronald Cedric, 625 
real part, 64, 212, 451 
reciprocity law, 94 
Recorde, Robert, 446, 626 
recurrences, 1-20 
and sums, 25-29 

doubly exponential, 97, 100, 101, 109 
floor/ceiling, 78-81 
implicit, 136-138, 193-194, 284 
periodic, 20, 179, 498 
solving, 337-350 
unfolding, 6, 100, 159-160, 312 
unfolding asymptotically, 456 
referee, 175 

reference books, 42, 223, 616, 619 
reflected light rays, 291-292 
reflected polynomials, 339 

reflection law for hypergeometrics, 217, 247, 539 
regions, 4-8, 17, 19 
Reich, Simeon, 626, 636 
Reingold, Edward Martin, 70 
relative error, 452, 455 
relatively prime integers, 108, 115-123 
remainder after division, 81-82 
remainder in Euler’s summation formula, 471, 
474-475, 479-480 
Renz, Peter Lewis, viii 
repertoire method, 14-15, 19, 250 

for Pibonacci-like recurrences, 312, 314, 372 
for sums, 26, 44-45, 63 
replicative function, 100 
repunit primes, 516 
residue calculus, 495 
residue number system, 126-129, 144 
retrieving information, 411-413 
rewards, monetary, ix, 256, 497, 525, 575 
Rham, Georges de, 626, 635 
Ribenboim, Paolo, 555, 626, 634 
Rice, Stephan Oswald, 626 


Rice University, ix 

Riemann, Georg Friedrich Bernhard, 205, 626, 
633 

hypothesis, 526 

Riemann’s zeta function, 65, 595 

as generalized harmonic number, 277-278, 286 
as infinite product, 371 
as power series, 601 

dgf’s involving, 370-371, 373, 463, 566, 569 
evaluated at integers, 238, 286, 571, 595, 597 
rising factorial powers, 48 
binomial theorem for, 245 
complex, 211 
negative, 63 

related to falling powers, 63, 312 
related to ordinary powers, 263, 598 
Roberts, Samuel, 626, 633 
rocky road, 36, 37 
R0dseth, 0ystein Johan, 626, 634 
Rolletschek, Heinrich Franz, 514 
roots of unity, 149, 204, 375, 574, 598 
fifth, 553 

modulo m, 128-129 
Roscoe, Andrew William, 620 
Rosser, John Barkley, 111, 626 
Rota, Gian-Carlo, 516, 626 
roulette wheel, 74-76, 453 
rounding to nearest integer, 95, 195, 300, 

344, 491 
unbiased, 507 
Roy, Ranjan, 626, 634 
rubber band, 274-275, 278, 312, 493 
ruler function, 113, 146, 148 
running time, 413, 425-426 

O-notation for, abused, 447-448 
Ruzsa, Imre Zoltan, 611 

Saalschiitz, Louis, 214, 627, 634 

identity, 214-215, 234-235, 529, 531 
Saltykov, Al’bert Ivanovich, 463, 627 
sample mean and variance, 391-393, 427 
sample third cumulant, 429 
samplesort, 354 
sandwiching, 157, 165 
Sarkozy, Andras, 548, 627 
Sawyer, Walter Warwick, 207, 627 
Schaffer, Alejandro Alberto, 632 
Schinzel, Andrzej, 525 
Schlomilch, Oscar Xaver, 627 
Schmidt, Asmus Lorenzen, 634 
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Schoenfeld, Lowell, 111, 626 
Schonheim, Johanen, 608 
Schroder, Ernst, 627, 635 
Schrodinger, Erwin, 430 
Schroter, Heinrich Eduard, 627, 635 
Schiitzenberger, Marcel Paul, 636 
science and art, 234 
Scorer, Richard Segar, 627, 633 
searching a table, 411-413 
Seaver, George Thomas (= 41), 8, 21, 94, 105, 
106, 343 

secant numbers, 317, 559, 570, 620 
second-order Eulerian numbers, 270-271 
second-order Fibonacci numbers, 375 
second-order harmonic numbers, 277, 280, 311, 
550-552 

Sedgewick, Robert, 632 
Sedlacek, Jin, 627, 635 
self-certifying algorithms, 104 
self-describing sequence, 66, 495 
self reference, 59, 95, 531-540, 616, 653 
set inclusion in O-notation, 446-447, 490 
Shallit, Jeffrey Outlaw, 627, 635 
Sharkansky, Stefan Michael, 632 
Sharp, Robert Thomas, 273, 627 
sherry, 433 

shift operator, 55, 240 

binomial theorems for, 188, 191 
Shiloach, Joseph (= Yossi), 632 
Shor, Peter Williston, 633 
Sicherman, George Leprechaun, 636 
sideways addition, 12, 114, 146, 250, 552 
Sierpinski, Waclaw, 87, 627, 634 
sieve of Eratosthenes, 111 
Sigma-notation, 22-25 
ambiguity of, 245 
signum function, 502 
Silverman, David L, 627, 635 
similar hyper geometric terms, 541 
skepticism, 71 
Skiena, Steven Sol, 548 

Sloane, Neil James Alexander, 42, 341, 464, 604, 
628, 633 

Slowinski, David Allen, 109 
small cases, 2, 5, 9, 155, 320-321; see also 
empty case 

Smith, Cedric Austen Bardell, 627, 633 

Snowwalker, Luke, 435 

Solov’ev, Aleksandr Danilovitch, 408, 628 


solution, 3, 337 
sorting, 

asymptotic efficiency of, 447-449 
bubblesort, 448 
merge sort, 79, 175 
possible outcomes, 378 
quicksort, 28-29, 54 
samplesort, 354 
spanning trees, 

of complete graphs, 368-369 
of fans, 348-350, 356 
of wheels, 374 
Spec, see spectra 
special numbers, 257-319 
spectra, 77-78, 96, 97, 99, 101 
generating functions for, 307, 319 
spinning coins, 401 
spiral function, 99 
Spohn, William Gideon, Jr., 628 
Sports, see baseball, football, frisbees, golf, 
tennis 

Sprugnoli, Renzo, 564 
square pyramidal numbers, 42 
square root, 

of 1 (mod m), 128-129 
of 2, 100 
of 3, 378 
of -1, 22 

squarefree, 145, 151, 373, 525, 548 
squares, sum of consecutive, 41-46, 51, 180, 245, 
269, 284, 288, 367, 444, 470 
stack size, 360-361 
stacking bricks, 313, 374 
stacking cards, 273-274, 278, 309 
Stallman, Richard Matthew, 628 
standard deviation, 388, 390-394 
Stanford University, v, vii, ix, 427, 458, 632, 

634, 657 

Stanley, Richard Peter, 270, 534, 615, 628, 

635, 636 

Staudt, Karl Georg Christian von, 628, 635 

Steele, Guy Lewis, Jr., 628 

Stegun, Irene Anne, 42, 604 

Stein, Sherman Kopald, 633 

Steiner, Jacob, 5, 628, 633 

Steinhaus, Hugo Dyonizy, 636 

Stengel, Charles Dillon (= Casey), 42 

step functions, 87 

Stern, Moriz Abraham, 116, 628 
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Stern-Brocot number system, 119-123 
related to continued fractions, 306 
representation of \/3, 572 
representation of y, 306 
representation of 7t, 146 
representation of cf) , 550 
representation of e, 122, 150 
simplest rational approximations from, 
122-123, 146, 519 

Stern-Brocot tree, 116-123, 148, 525 
largest denominators in, 319 
related to continued fractions, 305-306 
Stern-Brocot wreath, 515 
Stewart, Bonnie Madison, 614, 633 
Stickelberger, Ludwig, 628, 633 
Stieltjes, Thomas Jan, 617, 628, 633 
constants, 595, 601 

Stirling, James, 192, 195, 210, 257, 258, 297, 
481, 628 

approximation, 112, 452, 481-482, 491, 496 
approximation, perturbed, 454-455 
constant, 481, 485-489 
polynomials, 271-272, 290, 311, 317, 352 
triangles, 258, 259, 267 
Stirling numbers, 257-267 
as sums of products, 570 
asymptotics of, 495, 602 
combinatorial interpretations, 258-262 
convolution formulas, 272, 290 
duality of, 267 

generalized, 271-272, 311, 316, 319, 598 
generating functions for, 351-352, 559 
identities for, 264-265, 269, 272, 290, 311, 
317, 378 

inversion formulas for, 310 

of the first kind, 259 

of the second kind, 258 

related to Bernoulli numbers, 289-290, 

317 (exercise 76) 
table of, 258, 259, 267 
Stone, Marshall Harvey, vi 
Straus, Ernst Gabor, 564, 611, 624 
Strehl, Karl Ernst Volker, 549, 629, 634 
subfactorial, 194-196, 250 
summand, 22 
summation, 21-66 

asymptotic, 87-89, 466-496 
by parts, 54-56, 63, 279 
changing the index of, 30-31, 39 


definite, 49-50, 229-241 
difficulty measure for, 181 
factors, 27-29, 64, 236, 248, 275, 543 
in hypergeometric terms, 224-229 
indefinite, see indefinite summation 
infinite, 56-62, 64 

interchanging the order of, 34-41, 105, 136, 
183, 185, 546 
mechanical, 229-241 
on the upper index, 160-161, 175-176 
over divisors, 104-105, 135-137, 141, 370 
over triangular arrays, 36-41 
parallel, 159, 174, 208-210 
sums, 21-66; see also summation 
absolutely convergent, 60-62, 64 
and recurrences, 25-29 
approximation of, by integrals, 45, 276-277, 
469-475 

divergent, see divergent sums 
double, see double sums 
doubly infinite, 59, 98, 482-483 
empty, 24, 48 
floor/ceiling, 86-94 

formal, 321; see also formal power series 
hypergeometric, see hypergeometric series 
infinite, 56-62, 64 

multiple, 34-41, 61; see also double sums 
notations for, 21-25 

of consecutive cubes, 51, 63, 283, 289, 367 
of consecutive integers, 6, 44, 65 
of consecutive mth powers, 42, 283-285, 
288-290, 366-368 

of consecutive squares, 41-46, 51, 180, 245, 
269, 284, 288, 367, 444, 470 
of harmonic numbers, 41, 56, 279-282, 
312-313, 316, 354-355 
paradoxical, 57 

tails of, 466-469, 488-489, 492 
Sun Tsu [= Sunzi, Master Sun], 126 
sunflower, 291 

super generating functions, 353, 421 
superfactorials, 149, 243 
Swanson, Ellen Esther, viii 
Sweeney, Dura Warren, 629 
Swinden, Benjamin Alfred, 633 
Sylvester, James Joseph, 133, 629, 633 
symmetry identities, 

for binomial coefficients, 156-157, 183 

for continuants, 303 

for Eulerian numbers, 268 
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Szegedy, Mario, 525, 608, 629 
Szego, Gabor, 625, 636 

T n , see tangent numbers 
tail exchange, 466-469, 486-489 
tail inequalities, 428, 430 
tail of a sum, 452-455 
tale of a sum, see squares 
tangent function, 287, 317 
tangent numbers, 287, 312, 317, 620 
Tanny, Stephen Michael, 629, 635 
Tartaglia, Nicolo, triangle, 155 
Taylor, Brook, series, 163, 191, 287, 396, 
470-471 

telescoping, 50, 232, 236, 255 
tennis, 432-433 
term, 21 

hyper geometric, 224, 243, 245, 527, 575 
term ratio, 207-209, 211-212, 224-225 
TbX, 219, 432, 657 
Thackeray, Henry St. John, 618 
Theisinger, Ludwig, 629, 634 
theory of numbers, 102-152 
theory of probability, 381-438 
theta functions, 483, 524 
theta operator, 219-221, 347 

converting between D and O, 310 
Thiele, Thorvald Nicolai, 397, 398, 629 
thinking, 503 

big, 2, 441, 458, 483, 486 
not at all, 56, 230, 503 

small, see downward generalization, small 
cases 

three-dots (■ ■ ■) notation, 21 
advantage of, 21, 25, 50 
disadvantage of, 25 
elimination of, 108 
tilings, see domino tilings 
Titchmarsh, Edward Charles, 629, 636 
Todd, Horace, 501 
Toledo, Ohio, 73 
Tong, Christopher Hing, 632 
totient function, 133-135 
dgf for, 371 
divisibility by, 151 

summation of, 137-144, 150, 462-463 
Toto, 581 

tournament, 432-433 
Tower of Brahma, 1, 4, 278 


Tower of Hanoi, 1-4, 26-27, 109, 146 
variations on, 17-20 
Trabb Pardo, Luis Isidoro, 632 
transitive law, 124 
failure of, 410 

traps, 154, 157, 183, 222, 542 
trees, 

2-3 trees, 636 
binary, 117 
of bees, 291 

spanning, 348-350, 356, 368-369, 374 
Stern-Brocot, see Stern-Brocot tree 
triangular array, summation over, 36-41 
triangular numbers, 6, 155, 195-196, 260, 380 
triangulation, 374 

Tricomi, Francesco Giacomo Filippo, 629, 636 
tridiagonal matrix, 319 
trigonometric functions, 

related to Bernoulli numbers, 286-287, 317 
related to probabilities, 435, 437 
related to tilings, 379 
trinomial coefficients, 168, 171, 255, 571 
middle, 490 
trinomial theorem, 168 
triphages, 434 

trivial, clarified, 129, 417-418, 618 
Turan, Paul, 636 
typefaces, viii-ix, 657 

Uchimura, Keisuke, 605, 635 
unbiased estimate, 392, 429 
unbiased rounding, 507 
uncertainty principle, 481 
undetermined coefficients, 529 
unexpected sum, 167, 215-216, 236, 247 
unfolding a recurrence, 6, 100, 159-160, 312 
asymptotically, 456 
Ungar, Peter, 629 

uniform distribution, 395-396, 418-419 
uniformity, deviation from, 152; see also 
discrepancy 

unique factorization, 106-107, 147 
unit, 147 

unit fractions, 95, 101, 150 
unwinding a recurrence, see unfolding a 
recurrence 

up-down permutations, 377 

upper index of binomial coefficient, 154 

upper negation, 164-165 

upper parameters of hypergeometric series, 205 
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upper summation, 160-161, 176 
useless identity, 223, 254 
Uspensky, James Victor, 615, 629, 633 

V: variance, 387-398, 419-425 
van der Poorten, Alfred Jacobus, 629 
Vandermonde, Alexandre Theophile, 169, 629, 634 
Vandermonde’s convolution, 169-170 
as a hyper geometric series, 211-213 
combinatorial interpretation, 169-170 
derived mechanically, 234 
derived from generating functions, 198 
generalized, 201-202, 218-219, 248 
with half-integers, 187 
vanilla, 36 

Vardi, Ilan, 525, 548, 603, 620, 629, 633, 636 
variance of a probability distribution, 387-398, 
419-425 

infinite, 428, 587 
Veech, William Austin, 514 
Venn, John, 498, 630, 633 
diagram, 17, 20 
venture capitalists, 493-494 
violin string, 29 
vocabulary, 75 

Voltaire, de (= Arouet, Frangois Marie), 450 
von Staudt, Karl Georg Christian, 628, 635 
Vyssotsky, Victor Alexander, 548 

Wall, Charles Robert, 607, 635 
Wallis, John, 630, 635 
Wapner, Joseph Albert, 43 
war, 8, 16, 85, 434 
Waring, Edward, 630, 635 
Waterhouse, William Charles, 630, 635 
Watson, John Hamish, 229, 405 
Waugh, Frederick Vail, 630, 635 
Weaver, Warren, 630 
Weber, Heinrich, 630 
Weisner, Louis, 516, 630 
Wermuth, Edgar Martin Emil, 603, 630 
Weyl, Claus Hugo Hermann, 87, 630 
Wham-O, 435, 443 
wheel, 74, 374 
big, 75 

of Fortune, 453 

Whidden, Samuel Blackwell, viii 
Whipple, Francis John Welsh, 630, 634 
identity, 253 

Whitehead, Alfred North, 91, 503, 603, 630 


Wiles, Andrew John, 131 

Wilf, Herbert Saul, 81, 240, 241, 514, 549, 575, 
620, 624, 630-631, 634 
Williams, Hugh Cowie, 631, 633 
Wilquin, Denys, 634 

Wilson, Sir John, theorem, 132-133, 148, 516, 
609 

Wilson, Martha, 148 
wine, 433 

Witty, Carl Roger, 509 
Wolstenholme, Joseph, 631, 635 
theorem, 554 
Wood, Derick, 631, 633 
Woods, Donald Roy, 628 
Woolf, William Blauvelt, viii 
worm, 

and apple, 430 

on rubber band, 274-275, 278, 312, 493 
Worpitzky, Julius Daniel Theodor, 631 
identity, 269 
wreath, 515 

Wrench, John William, Jr., 600, 606, 636 
Wright, Sir Edward Maitland, 111, 617, 631, 633 
Wythoff (= Wijthoff), Willem Abraham, 614 

Yao, Andrew Chi-Chih, ix, 632 
Yao, Foong Frances, ix, 632 
Yao, Ql, 622 

Youngman, Henry (= Henny), 175 
zag, see zig 

Zagier, Don Bernard, 238 
Zapf, Hermann, viii, 620, 657 
Zave, Derek Alan, 631, 635 
Zeckendorf, Edouard, 631 
theorem, 295-296, 563 

Zeilberger, Doron, ix, 229-231, 238, 240, 241, 
631, 634 

zero, not considered harmful, 24-25, 159 
strongly, 24-25 
zeta function, 65, 595 

and the Riemann hypothesis, 526 
as generalized harmonic number, 277-278, 286 
as infinite product, 371 
as power series, 601 

dgf’s involving, 370-371, 373, 463, 566, 569 
evaluated at integers, 238, 286, 571, 595, 597 
Zhu Shijie, see Chu Shih-Chieh 
zig, 7-8, 19 
zig-zag, 19 

Zipf, George Kingsley, law, 419 
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Errata, 1994—1997 


“And if you happen 
in reading to finde 
any more fauites 
not here mentioned, 
as peraduenture you 
may, ... I trust 
you will therefore 
impute no blame 
either vnto me 
or to the Printer, 
but gently amend 
and correct them, 
accepting our good 
minde, which was to 
haue had the booke 
passed to your han- 
des vtterly without 
fault, as touching 
the Printing." 

— from the first 
English edition 
of Euclid [98] 


THIS IS A LIST of all corrections made to the second edition of Concrete 
Mathematics by Graham, Knuth, and Patashnik, after the first appearance of 
that book in February 1994 until the final updates to the hardcover version 
made at the beginning of 1998. All subsequent updates are posted on Internet 
page http://www-cs-faculty.stanford.edu/~knuth/gkp.html, where you 
can also find additional material such as exam questions and answers that 
were prepared after the book was published. 

Minor changes in typographic layout are not listed here; neither are 
changes to the index that were precipitated by changes to the text. With 
these modifications, the book should be perfect. 

But you may disagree. To claim a reward for any anomalies you spot 
that aren’t shown here, please send your comments to D. E. Knuth, Com- 
puter Science Department, Stanford University, Stanford CA 94305-9045, or 
by email to knuth-bug@cs.stanford.edu, as soon as possible. Please use 
email only to report errors, not to ask questions. 

All of the items dated before February 1995, and a few items dated after 
that, were corrected in the second printing. 


Page 20, line 16 


(10 Jan 95) 


whose solution is periodic regardless of the initial values Xq, . . . , X^-i . 


Page 76, line 8 from the bottom (18 Dec 96) 


1,000,000,000 1500000.0 1502497 0.166 
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Page 87, lines 11 to 14 from the bottom 


(13 May 95) 


for all irrational a and all bounded functions f that are continuous almost 
everywhere. For example, the average value of (na) can be found by setting 
f(x) = x; we get j. (That’s exactly what we might expect; but it’s nice to 
know that it is really, provably true, no matter how irrational a is.) 


Page 88 , last two lines and top line of page 89 


(31 Jan 95) 


The quantity jot' will be an integer only when j = 0, since <x (hence a') is 
irrational; and jot' — v' will be an integer for at most one value of j. So we 
can change the ceiling terms to floors: 


Page 89, line 14 

(31 Jan 95) 

Here e is a positive error less than vcxT 1 . Exercise 18 proves 

that S is, simi- 

Page 89, line 18 [the first printing was correct] 

(26 Sep 95) 

D(ot,n) ^ D(a', [an] ) + a -1 + 2 . 

(3-3i) 

Page 89, lines 11 and 12 from the bottom 

(26 Sep 95) 

sufficiently large. Hence theorem ( 3 . 28 ) is true; however, convergence to the 
limit is not always very fast. (See exercises 9.45 and 9.61.) 

Page 97, line 7 from the bottom [first printing was correct] 

(26 Sep 95) 

D(a,n) j> D(a', [an]) — a 1 — 2 . 

Page 109, line 23 

(13 Jan 95) 


out, its 65,050 decimal digits require 78 cents U.S. postage to mail first class. 
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Hermann Minkowski 
illustrated this 
remarkable binary 
representation 
at the Interna- 
tional Congress of 
Mathematicians in 
Heidelberg, 1904. 


Page 109, line 7 from the bottom 

(31 Dec 97) 

seventeenth century [269]. The Mersenne 

: primes known prior to 1998 occur 

Page 109, line 4 from the bottom 

(31 Dec 97) 

110503, 132049, 216091, 756839, 859433, 

1257787, 1398269, and 2976221. 

Page 110, lines 8 and following 

(4 Jan 94) 

been a Mersenne prime, although only 
known. Many people are trying . . . 

a few dozen Mersenne primes are 

Page 116, line 19 

(31 Dec 95) 

independently by Moritz Stern [339], a German mathematician, and Achille 

Page 122, new graffito for middle of page 

(15 Apr 96) 


Page 123, lines 2-4 (26 Sep 95) 

example, ||§ « 2.718266 « .99999 4 e; we obtained this fraction from the first 
16 letters of e’s Stern-Brocot representation, and the accuracy is about what 
we would get with 16 bits of e’s binary representation. 


Page 126, line 21 


(28 Mar 97) 


discovered by Sun Tzu in China, about A.D . 350. 


Page 130, line 4 from the bottom in second printing 


(1 Mar 95) 


ception of one that became the most famous of all, because it baffled the 
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Page 131, lines 1-4 

(1 Mar 95) 

for all positive integers a, b, c, and n, when n > 2. (Of course there are lots 
of solutions to the equations a + b = c and a 2 + b 2 = c 2 .) Andrew Wiles 
has apparently settled the question at long last; his intricate, epoch-making 
proof of ( 4 . 46 ) appears in Annals of Mathematics 142 (1995), 443-551. 

Page 146, lines 2-4 from the bottom 

(24 May 96) 

mi , . . . , m r be positive integers with m.j J_ mic 
m = mi . . . m r ; and let ai , . . . , a r , A be integers 
one integer a such that 

for 1 ^ j < k ^ r; let 
. Then there is exactly 

Page 149, lines 1 and 2 

(25 Apr 95) 

49 Let R(N) be the number of pairs of integers (m, n' 
1 sC n ^ N, and m _L n. 

1 such that 1 ^ m^ N, 

Page 149, line 7 from the bottom 

(24 May 96) 

53 Find all positive integers n such that n \ |~(n — 1 

)!/(n+ I)]. 

Page 171, line 7 from the bottom 

(9 April 96) 

This follows from a five-parameter identity discovered by John Dougall [82] 
early in the twentieth century. 

Page 172, line 9 

(13 May 95) 

several people shortly thereafter. Exercise 86 gives a “f 

simple” proof of ( 5 . 31 ). 

Page 177, line 6 from the bottom 

(24 May 96) 


seen so fax. And we’re faced with a sum of 2 1 000000 + 1 terms, so we can’t just 
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Exercise 84 explains 
how to derive (5.61) 
from (5.60). 


We didn ’t discuss 
convergence of 

(5-56), (5-57), 
(5.58), . . . either. 


The value is infinite 
when z is a neg- 
ative integer and 
w is not an integer. 


Page 200, line 4 from the bottom (1 Dec 94) 


We will prove in Section 7.5 that these functions satisfy the identities 


Page 201, addendum to the graffito (12 May 95) 


Page 202, line 8 from the bottom 

(20 Jun 96) 

£(z) = Jjlc+D'-'g = l+z+lz 2 + |z s + ^z‘ + 

k^O 

•••» (5-66) 

Page 206, new graffito about 10 lines from the bottom 

(6 Jun 96) 


Page 211, addendum to the graffito in middle of page 


(7 Nov 94) 


Page 215, line 4 


(13 May 95) 


n+1 , m— n, 1 , j 
im+1, Jj-m+i, 2 


1 = 


m 

n 


integer n m > 0, 


Page 215, line 6 


(13 May 95) 


new. But it really isn’t; the left-hand side can be replaced by a multiple of 


Page 215, line 11 from the bottom 


(13 May 95) 


3 r > l r + b J r +§’ “ n < -n_ i s ’ -n-^s + i 

jr+j, jr+ 1, — n— |s, — n— js + j, — n— jS-H 
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Page 215, line 8 from the bottom (13 May 95) 


quantities (r, s, n) are replaced respectively by (1 , m — 2n — 1 , n — m). 


Page 222, line 10 (12 Aug 96) 


whenever both infinite sums converge. And in fact both sums always do 
converge, except in the degenerate case when a+b+ j is a nonpositive integer. 


Page 232, line 2 in first printing 

(11 Apr 95) 

deg(R) = 1 (assuming that z / ±1), we have d = deg(p) 

— deg(Q) = 0 and 

Page 233, line 11 

(1 Jun 94) 

algorithm can be formulated as follows, when t(n, k) is given: 

Page 237, line 2 from the bottom 

(26 Sep 95) 

And ( n | 1 )z n+1 = (°)z n+1 = T(n, n + 1 ) for all n 0, so 

we obtain 

Page 241, line 4 from the bottom 

(13 Jan 95) 

Exercises 98-108 provide additional examples of the Gosper-Zeilberger 

Page 244, lines 2 and 3 

(24 May 96) 

definition, by showing that the limit in (5.83) is 1/m! when z = m is a 

positive integer. 


Page 245, line 14 

(24 May 96) 


when c is a nonnegative integer. (See (5.115).) Use this idea to evaluate 
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Page 253, line 3 from the bottom 

(26 Sep 95) 

94 Find Y (£) 5k when n is a positive integer. 

Page 254, line 2 of exercise 99 

(26 Sep 95) 

when t(n, k) = (n+a + b + c + k)!/(n + k) ! (c + k)! (b 

— k) ! (a — k) ! k!. 

Page 272, equations (6.50), (6.51), (6.53) 

(4 Jun 95) 

change to 


n^O n 


Page 289, lines 6 and 7 

(4 Jun 95) 


S 6 (n) = n(n— ^)(n— 1 )(n— ^ + a)(n— ^ — a)(n— ^ + a)(n— ^ — a)/7 , 
where a = 2~ 3/2 3^ 1/4 (\/\/3T + VZ7 + i \J Vf\ - V27 ) . 



Page 292, lines 10 and 11 from the bottom (27 May 94) 

When n = 6, for example, Cassini’s identity correctly claims that 13-5 — 8 2 
equals 1. (Johannes Kepler knew this law already in 1608 [202].) 
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Page 301, lines 19-24 


(9 Jul 95) 


of miles per n kilometers for all n ^ 100, except in the cases n = 4, 12, 54, 62, 
75, 83, 91, 96, and 99, when it is off by less than 2/3 mile. And the shift-up 
rule gives either the correctly rounded number of kilometers for n miles, or 
1 km too many, for all n ^ 113. (The only really embarrassing case is n = 4, 
where the individual rounding errors for n = 3 + 1 both go the same direction 
instead of cancelling each other out.) 


Page 316, line 4 from the bottom 

(9 Jul 95) 

k^1 

Page 317, line 4 

(4 Jun 95) 

2 Tl — 1 - 

Z Z Z Z v— 

— cot — tan b > 

2 n 2 n 2 n 2 n ^ 

k=1 

' z / z + k7t z — k7t\ 

2n l 2 n 2 n ) ’ 

Page 317, lines 6-9 from the bottom 

(21 Jun 94) 


77 When m and n are integers, n /j 0, the value of cr n (m.) is given by (6.48) 
if m < 0, by (6.49) if m > n, and by (6.101) if m = 0. Show that in the 
remaining cases we have 


cr n (m) 


m! (n — m)! 

' ’ k=0 


m 

m — k 


B n -k 
n-k ’ 


integer n / ra > 0. 


Page 319, line 14 from the bottom (1 Nov 94) 


a Are there infinitely many n with p\a n , for some fixed prime p? 


Page 353, line 6 


(4 Jun 95) 


e — e~ 


L 

m,n^0 


+ n + 1 


w m z 


n 


(7.60) 


we z — ze w 


m 


(m + n+ 1)! ' 
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Page 361, line 20 and following 

(13 May 95) 

sequences (do, . . . , a mn ) arise uniquely in this way, if n > 0: 
a mn must be (1 — m). The partial sums . . . 

The last term 

Page 379, line 2 from the bottom 

(17 May 94) 

have the form p n (p) = Hm-o |m|u m > where |^J is a positive integer for 

Page 391, line 5 

(12 May 95) 

between 6.975 million and 7.025 million. 

Page 393, line 1 

(12 May 95) 

We estimate the average spot sum of these dice to be 7.4±2.1 /vTO ss 7.4±0.7, 

Page 419, line 4 

(4 Jun 95) 

Once again we have gained the desired speedup factor of 1 /m. 

If m « n/lnn 

Page 419, lines 8 and 9 

(4 Jun 95) 

this distribution is called “Zipf’s law.” Then Mean(S) = u/H n — 1, Var(S) = 
jn(n + 1)/H n — n 2 /H 2 . The average number of probes for m ss n/lnn as 

Page 425, line 18 

(24 May 96) 

must set a NEXT entry to n + 1 . These two cases may take different amounts 

Page 426, bottom three lines 

(4 Jun 95) 

m = n/lnn + 0(1 ) and n — > oo the corresponding results are 


Mean(Gj) = |3 Inn + a. + 0((logn) 2 /n) ; 

Var(G T ) = |3 2 Inn + 0((logn) 2 /n) . 
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Page 433, second graffito should be in French 


(8 May 96) 


“Une rapide opera- 
tion arithmetique 
montre que, grace 
a cette ingenieuse 

Page 457, line 4 from the bottom (9 Jul 95) cascade, les xeres 

ont toujour s au 

For example, when n = 10 6 this estimate comes to 15631363.6 + 0(n/logn); 'pousser'p^us^oin 

le calcul de leur age 

donne le vertige.” 

Page 466, line 5 (26 Sep 95) -Revue du vin de 

v ’ France (Nov 1984 ) 


° L 


0<k<n 


(logTT -) 2 
k (n - k) 


= O 


(log Tl) 

n 


Page 474, line 3 


(9 Jul 95) 


negative multiple of cos(27tx — ^-mu), with error 0(2 m max* B m ({x})). 


Page 475, line 11 


(9 Jul 95) 


— 9r 


2m+2 f( 2 m+ 1) (x) 

(2m + 2)! 


b 

a ’ 


for some 0 + 0 m + 1; 


(9.80) 


Page 477, line 14 from the bottom 


(9 Jul 95) 


So R4(n) is actually equal to — ln2 + 0(n 5 ), but Euler’s summation 


Page 480, line 4 from the bottom 


(9 Jul 95) 

H - lnn + y+ 1 V ^ 

Q B 2m+ 2 

, (9.88) 

H n — in n + y + 2_ 2 kn^ 

k=l 

m ’ n (2m + 2)n 2m+2 

Page 481, line 5 


(27 May 94) 


the value of y correct to 1271 decimal places, beginning thus: 
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Page 500, lines 15-18 

(31 Jan 95) 

1.19 Not when n > 5. A bent line whose half-lines run at angles 0 and 
0 + 30° from its apex can intersect four times with another whose half-lines 
run at angles <£> and cf> + 30° only if 30° < |0 — cf>| < 150°. We can’t choose 
more than 5 angles this far apart from each other. (It is possible to choose 5.) 

Page 501, lines 5 and following 

(20 Feb 95) 

1.24 The only known examples are: X n = 2isin7rr + 1/X n _i 
rational and 0 ^ r < 1 (all period lengths ^ 2 occur as r varies): 

, where r is 
; Gauss’s . . . 

Page 509, lines 5 and 6 

(26 Sep 95) 

3.29 D(a', [an]) is at most the maximum of the absolute value of 

s(ct', [no] , v') = — s(a,n,v) — S + e +{0 or 1} + v' — {0 

or 1} . 

Page 512, line 5 from the bottom 

(8 May 96) 

(_1 lMn)<y(n)]_ 

Page 520, lines 4-6 from the bottom 

(25 Apr 95) 

4.49 (a) Either m < n (<D(N) — 1 cases) or m = n (one case) or m > n 

(CD(N) — 1 again). Hence R(N) = 2®(N) — 1. (b) Prom (4.62) we get 

2®(N) — 1 = -1 + Y_ p(d)LN/dJL1 +N/dJ; 

d >1 


Page 521, bottom line 

(24 May 96) 

[(n — 1)!/(n + 1 )] = ((n — 1)! + u)/(u + 1); 

Page 525, line 4 of answer 4.69 

(20 Jun 96) 

in exercise 60 are the best that were known in 1994 [255]. 

Exercise 68 
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Page 530, line 15 


(6 Jun 96) 

for all m > 0 and n > 0, by induction on m + n. 

Page 531, line 5 from the bottom 

(4 Jun 95) 

/n + a - 1\ /a, b,— n 

A a n (c-b) n 


V n / \ c,a 

/ n! c"- 


Page 532, new graffito for top of page 

(13 Jan 95) 


Page 540, line 7 from the bottom (26 Sep 95) 


quently, when m ^ 0 is an integer less than n, we have 


Page 547, correction to first line of graffito (21 Jan 95) 


Page 551, lines 5-8 from the bottom (9 Jul 95) 


valid for k = n — 1 if we write it in the form 

^m+n Al > • • • > ^m+n ) ^n— 1 Am+n— 1 > • • • > ^m+1 ) 

= Km+n— 1 (*1 > • • • > ^m+n.— 1 ) Kn Am+n > • • • > *m+l ) 
- (-l) n K m _i(x 1 ,...,x m _i) . 

[Also delete ‘k = 2,’ where it appears on the bottom line.] 


Page 555, line 2 (1 Nov 94) 


16735, and 102728. See the answer to exercise 92.) 


Term limits? 


Notice that 1/nk 
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“Curiously, a. 2 n is 
equal to llirn the 
square of the num- 
ber of ways to tile 
a 3 x 2n rectangle 
with dominoes; and 
Q 2n+1 =2V| n+ i.” 

— I. Kaplansky 


“Toto, I’ve a 
feeling we’re 
not in Kansas 
anymore." 

— Dorothy 


Page 564, lines 13 and 14 

(14 Jun 94) 

O = Lk (£)k n (- 1 ) m - k /m! for integer m /> 
( 6 . 3 ) holds for all n 0 1 . 

0 and arbitrary n ig: 0 ; then 

Page 564, lines 15 and 16 

(1 Nov 94) 

6.92 (a) David Boyd has shown that there are only finitely many solutions 

for all p < 500, except perhaps p = 83, 127, 397. (b) The behavior of b n is 
quite . . . 

Page 568, new graffito for the answer to 7.23 

(15 Aug 94) 


Page 578, lines 3 and 4 of answer 22 

(13 May 96) 

(E(E(X|Y))) 2 . But E(E(X|Y)) = £ Pr(Y = y)E(X|y) = ^ Pr(Y = y)x 

Pr((X|y) =x)x = EX and E(E(X 2 |Y)) = E(X 2 ), so the result is 

just VX. 

Page 581, correction to the graffito 

(18 Aug 96) 


Page 594, line 2 

(9 Jul 95) 

— 5.5/(lnn ) 2 + 0(loglogn/logn) 3 ; then we estimate Pi 000000 f 

« 154 80992.8.) 

Page 597, line 4 of answer 9.40 

(20 Jun 96) 


n/2 


L 


H 


m— 1 
2k 


n/2 


L 

k=1 


(lnle^k )™- 1 TOflk - 1 (log k) m - 2 ) 


k 


k 
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Page 599, lines 1 and 2 


(26 Sep 95) 


for all m. Divide by n and let n — > oo; the limit points are bounded above by 
cti . . . <x m for all m. Finally we have 


Page 603, new graffito for answer 9.63 


(31 Dec 97) 


Page 605, lines 16-20 


(29 Mar 97) 


Additional progress 
on this problem has 
been made by Jean- 
Luc Remy, Journal 
of Number Theory, 
vol. 66 (1997), 1-28. 


15 M. D. Atkinson, “The cyclic towers of Hanoi,” Information Processing 633. 
Letters 13 (1981), 118-119. 

16 M. D. Atkinson, “How to compute the series expansions of secx and 635. 
tanx,” American Mathematical Monthly 93 (1986), 387-389. [See also 

A. J. Kempner, “On the shape of polynomial curves,” Tdhoku Math. J. 

37 (1933), 347-362; R. C. Entringer, “A combinatorial interpretation of 
the Euler and Bernoulli numbers,” Nieuw Archief voor Wiskunde 14 
(1966), 241-246.] 


Page 608, lines 22 and 23 


(1 Feb 96) 


59 F. R. K. Chung and R. L. Graham, “On the cover polynomial of a di- 
graph,” Journal of Combinatorial Theory, series B, 65 (1995), 273-290. 


Page 612, line 10 


(31 Dec 95) 


lated into French, 1786; German, 1788; Russian, 1936; English, 1988. 


Page 612, line 16 and following 


(31 Dec 95) 


Analysi Finitorum ac Doctrina Serierum. St. Petersburg, Academiae 
Imperialis . . . 
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Page 614, line 8 

(1 Nov 96) 

132 J. Franel, Solutions to questions 42 and 

170, in L’ Intermediate des 

Page 614, line 11 

(31 Dec 95) 

minimal storage tree sorting,” Journal of the Association for Computing 
Machinery 27 (1970), 496-507. 

Page 615, line 12 

(24 May 96) 

145 Angelo Genocchi, “Intorno all’espressione 
liani,” 

generale de’numeri Bernul- 

Page 616, lines 5 and 6 from the bottom 

(12 Feb 96) 

170 Inger Johanne Haland and Donald E. Knuth, “Polynomials involving the 
floor function,” Mathematica Scandinavica 76 (1995), 194-200. 

Page 619, after line 2 

(27 May 94) 


202 Johannes Kepler, letter to Joachim Tancke (12 May 1608), in his Gesam- 
melte Werke, volume 16, 154-165. 

[Items previously numbered 202, 203, and 204 are now numbered 203, 204, 
205; the item previously numbered 205 has been dropped.] 


Page 620, line 5 from the bottom (31 Dec 95) 


229 E. E. Kummer, “Uber die hypergeometrische Reihe 


Page 622, lines 16 and 17 


(11 Apr 96) 


Stirling numbers of the second kind,” Discrete Mathematics 150 (1996), 
281-292. 
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Page 622, lines 18 and 19 (28 Mar 97) 


249 Li Shan-Lan, Dud Ji BI Lei [Sums of Piles Obtained Inductively]. In his 
Zeguxi ZhaT Suanxue [Classically Inspired Meditations on Mathematics], 


Page 623, line 6 (10 Oct 97) 


premier,” Bulletin de la Societe mathematique de France 6 (1877), 49-54. 


Page 624, line 5 (31 Dec 95) 


Zahlen,” Sitzungsberichte der Mathematisch - N aturwissenschaftlichen 


Page 626, line 18 (3 Aug 95) 


308 Paulo Ribenboim, 13 Lectures on Fermat’s Last Theorem. Springer- 


Page 626, line 5 from the bottom (6 Mar 96) 


313 Gian-Carlo Rota, “On the foundations of combinatorial theory. I. The- 


Page 627, line 20 (31 Dec 95) 


ence on Combinatorial Structures and their Applications, 1969.) 


Page 627, lines 10 and 11 from the bottom (1 Nov 96) 


letin International de 1’Academie Polonaise des Sciences et des Lettres 
(Cracovie), series A (1910), 9-11. 


Page 627, line 6 from the bottom (31 Dec 95) 


328 Waclaw Sierpihski, A Selection of Problems in the Theory of Numbers. 
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Page 628, lines 1 and 2 

(1 Nov 94) 

330 N. J. A. Sloane, A Handbook of Integer Sequences. Academic Press, 1973. 
Sequel, with Simon Plouffe, The Encyclopedia of Integer Sequences, Aca- 
demic Press, 1995. 

Page 628, lines 4 and 5 

(4 Mar 96) 

331 A. D. Solov’ev, “Odno kombinatornoe tozhdestvo 
dache o pervom nastuplenii redkogo sobytiia,” 

i ego primenenie k za- 
Teoriia veroiatnostei 

Page 629, lines 1 and 2 

(21 Jan 95) 

344 Volker Strehl, “Binomial identities — combinatorial and algorithmic as- 
pects,” Discrete Mathematics 136 (1994), 309-346. 

Page 630, bottom line 

(21 Jan 95) 

373 Herbert S. Wilf, generatingfunctionology. Academic Press, 1990; second 
edition, 1994. 

Page 633, left column 

(26 Sep 96) 

1.10 Atkinson [15]. 

Page 635, left column 

(26 Sep 96) 

6.75 Atkinson [16]. 

Page 635, left column 

(2 Oct 97) 


6.76 [209, answer 5. 1.3-3]; Lengyel [248]. 
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Page 635, right column 

(13 Jan 95) 

6.86 [226], 

Page 636, missing entry 

(8 May 96) 

8.45 1985 final exam. 

Page 638, left column 

(26 Sep 96) 

[remove the reference to V. I. Arnol’d] 

Page 638, right column 

(26 Sep 96) 

Atkinson, Michael David, 605, 633, 635. 

Page 639, right column 

(7 Jul 97) 

Blom, Carl Gunnar, 606, 636 

Page 639, right column 

(23 Jan 95) 

Boyd, David William, 564 

Page 640, left column 

(27 Oct 97) 

Cassini, Gian (= Giovanni = Jean) Domenico (= 

Dominique), 292, 607 

Page 641, left column 

(18 Dec 96) 


Concrete Math Club, 74, 453 
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Page 642, left column 

(1 Oct 94) 

Dirichlet, Johann Peter Gustav Lejeune, 370, 610, 633 

Page 643, left column 

(28 Dec 96) 

Entringer, Roger Charles, 605 

Page 645, left column 

(1 Oct 94) 

Gaufi (= Gauss), Johann Friedrich Carl (= Carl Friedrich), 

vii, 6, 7, . . . 

Page 647, right column 

(29 Mar 97) 

Kempner, Aubrey John, 605 

Page 648, left column 

(28 Mar 97) 

L! Shanlan Renshu (= Qiuren), 269, 622 

Page 648, right column 

(15 Apr 96) 

Minkowski, Hermann, 122 

Page 650, right column 

(29 Aug 96) 

Phi function: sum of cp , 137-139, 462-463 

Page 651, right column 

(1 Sep 94) 


Ramanujan Iyengar, Srinivasa, 330 
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Page 652, left column (31 Dec 97) 

Remy, Jean-Luc, 603 

Page 652, left column (3 Aug 95) 

Ribenboim, Paulo, 555, 626, 634 

Page 653, right column (31 Dec 95) 

Stern, Moritz Abraham, 116, 628 

Page 654, right column (28 Mar 97) 

Sun Tzu (= SunzI, Master Sun), 126 

Page 655, right column (15 Dec 96) 

umop-apisdn function, 193. 

Page 656, right column in second printing (1 Mar 95) 

Wiles, Andrew John, 131 

Finally, on the back cover, line 7 (5 Feb 96) 

change ‘indispensible’ to ‘indispensable’. 



